巴西专利BR112014017119B1 system, method, apparatus and computer-readable media for bit allocation for redundant transmission

专利PDF首页>>巴西专利

专利附录

专利说明

权利要求

类似技术

同族专利

引用文献

法律状态

优先权

专利摘要:
SYSTEM, APPARATUS METHOD AND MEDIA READABLE BY COMPUTER FOR BIT ALLOCATION FOR REDUNDANT TRANSMISSION OF AUDIO DATA. Reliability based on compressibility of initial bit allocations for frames of an audio signal is described. Applications for critical frame redundancy-based retransmission (for example, for fixed bit rate modes of speech codec operation) are also described.
公开号:BR112014017119B1
申请号:R112014017119-0
申请日:2012-12-20
公开日:2020-12-22
发明作者:Vivek Rajendran；Venkatesh Krishnan；Daniel J. Sinder
申请人:Qualcomm Incorporated；
IPC主号:

专利说明:

[0001] This patent application claims priority to Provisional Application No. 61 / 586,007, entitled "SYSTEMS, METHODS, APPARATUS, AND COMPUTER-READABLE MEDIA FOR BIT ALLOCATION FOR REDUNDANT TRANSMISSION" filed on January 12, 2012, and assigned to assignee. This patent application also claims priority to Provisional Application No. 61 / 587,507, entitled "SYSTEMS, METHODS, APPARATUS, AND COMPUTER-READABLE MEDIA FOR CRITICALITY THRESHOLD CONTROL", filed on January 17, 2012, and assigned to the assignee of the same . This patent application also claims priority to Provisional Application No. 61 / 641,093, entitled "SYSTEMS, METHODS, APPARATUS, AND COMPUTER-READABLE MEDIA FOR BIT ALLOCATION FOR REDUNDANT TRANSMISSION", filed on May 1, 2012 and assigned to the assignee of the same. FUNDAMENTALS Field
[0002] This disclosure refers to audio communications. Foundations
[0003] Digital audio telecommunications were carried out on circuit-switched networks. A circuit switched network is a network in which a physical path is established between two terminals for the duration of a call. In circuit-switched applications, a transmission terminal sends a sequence of packet information containing audio (for example, voice) on the physical path to the receiving terminal. The receiving terminal uses the audio information contained in the packets (for example, voice information) to synthesize the corresponding audio signal (for example, a voice signal).
[0004] Digital audio telecommunications began to be carried out over packet switched networks. A packet switched network is a network in which packets are routed through the network based on the destination address. With packet-switched switches, routers determine a route for each packet individually, sending any available route to reach its destination. As a result, packets cannot reach the receiving terminal at the same time or in the same order. A de-jitter storer can be used at the receiving terminal to put the packets back in order and reproduce them out in a continuous sequential manner.
[0005] On some occasions, a packet is lost in transit from the transmitting terminal to the receiving terminal. A lost packet can degrade the quality of the synthesized audio signal. As such, benefits can be realized through systems and methods for remitting loss of information within a frame (for example, within a speech frame). SUMMARY
[0006] A method of processing an audio signal according to a general configuration includes calculating at least one value of a decision metric for a second frame of the audio signal that is subsequent in the audio signal for a first frame (for example, a critical frame) of the audio signal. This method also includes selecting one from a plurality of relocation candidates, based on at least one calculated value of the decision metric. In this method, the at least one calculated value is based on the compressibility measure of the second frame, and the selected reallocation candidate indicates a reallocation of an initial bit allocation to the second frame in a first portion and a second portion. Computer-readable storage media (for example, non-transitory media) having tangible characteristics that cause a machine to read the characteristics to perform such a method are also disclosed.
[0007] An apparatus for processing an audio signal according to another general configuration, includes means for calculating at least one decision metric value for a second frame of the audio signal that is subsequent in the audio signal to a first frame (for example, a Critical Frame) of the audio signal. This apparatus also includes means for selecting one from a plurality of relocation candidates, based on said at least one calculated value of the decision metric. In this apparatus, the at least one calculated value is based on the compressibility measure of the second frame, and the selected relocation candidate indicates a reallocation of an initial bit allocation to the second frame in a first portion and a second portion.
[0008] An apparatus for processing an audio signal according to another general configuration includes a calculator configured to calculate at least one value of a decision metric for a second frame of the audio signal that is subsequent in the audio signal to a first frame (for example, a critical frame) of the audio signal. This apparatus also includes a selector configured to select one among a plurality of relocation candidates, based on said at least one calculated value of the decision metric. In this apparatus, the at least one calculated value is based on the compressibility measure of the second frame, and the selected relocation candidate indicates a reallocation of an initial bit allocation to the second frame in a first portion and a second portion. BRIEF DESCRIPTION OF THE DRAWINGS
[0009] Figure 1A is a block diagram illustrating an example of a transmission terminal 102 and a receiving terminal 104, which communicate via the NW10 network.
[0010] Figure 1B shows a block diagram of an AE20 implementation of AE10 audio encoder.
[0011] Figure 2 shows examples of different terminal devices that can communicate with each other over an NW20 network.
[0012] Figure 3 shows a block diagram of a basic FE20 implementation of frame encoder FE10.
[0013] Figure 4 is a block diagram illustrating an example of a transmission terminal implementation 112 and a receiving terminal implementation 114.
[0014] Figure 5A shows a flow chart of an M100 method according to a general configuration.
[0015] Figure 5B shows a flow chart of an M200 implementation of the M100 method.
[0016] Figure 5C shows a flow chart of an M210 implementation of the M200 method.
[0017] Figure 6A shows an example of an audio signal frame sequence.
[0018] Figure 6B shows a correspondence between the intervals of the decision metric value D and a plurality of relocation candidates.
[0019] Figure 6C shows a flow chart of an M220 implementation of the M200 method.
[0020] Figure 7A shows a flow chart of an M300 implementation of the M100 method.
[0021] Figure 7B shows a flow chart for an M310 implementation of the M300 method.
[0022] Figure 8A shows a flow chart for an M400 implementation of the M100 method.
[0023] Figure 8B shows a flow chart for an M410 implementation of the M400 method.
[0024] Figure 9A shows a flow chart for an M420 implementation of the M400 method.
[0025] Figure 9B shows a flow chart for an M430 implementation of the M400 method.
[0026] Figure 10A shows a flow chart for an M500 implementation of the M400 method.
[0027] Figure 10B shows a flow chart for an M510 implementation of the M500 method.
[0028] Figure 11A shows a flow chart for an M520 implementation of the M500 method.
[0029] Figure 11B shows a flow chart for an M530 implementation of the M500 method.
[0030] Figure 12 shows a flow chart for an M540 implementation of the M500 method.
[0031] Figure 13A shows a flow chart of an M110 implementation of the M100 method.
[0032] Figure 13B shows a flow chart for an M120 implementation of the M110 method.
[0033] Figure 13C shows a flow chart for an M130 implementation of the M120 method.
[0034] Figures 14A and 14B show examples of relationships between channel status information and other system parameters, as described herein.
[0035] Figure 15A shows a flow chart of an M140 implementation of the M120 method.
[0036] Figure 15B shows a flow chart for an M150 implementation of the M130 and M140 methods.
[0037] Figure 16A shows a flow chart for an M600 implementation of the M100 method.
[0038] Figure 16B shows a flow chart for an M610 implementation of the M600 method.
[0039] Figure 16C shows a flow chart for an M620 implementation of the M600 method.
[0040] Figure 17A shows a flow chart for an M630 implementation of the M600 method.
[0041] Figure 17B shows a flow chart for an M640 implementation of the M600 method.
[0042] Figure 17C shows a flow chart for an M650 implementation of the M600 method.
[0043] Figure 18A shows a flow chart for an M660 implementation of the M400 and M610 methods.
[0044] Figure 18B shows a flow chart for an M670 implementation of the M400 and M620 methods.
[0045] Figure 18C shows a flow chart for an M700 implementation of the M600 method.
[0046] Figure 19A shows a flow chart for an M710 implementation of the M660 and M700 methods.
[0047] Figure 19B shows a flow chart for an M720 implementation of the M670 and M700 methods.
[0048] Figure 20A is a diagram of an IPv4 packet.
[0049] Figure 20B is a diagram of an IPv6 packet.
[0050] Figure 20C shows a block diagram of a D10 communications device.
[0051] Figure 21 shows an example of a RTP packet payload carrying a redundant copy of a critical frame and a copy of a subsequent frame.
[0052] Figure 22 is a block diagram of an AD20 implementation of the AD10 audio decoder.
[0053] Figure 23A shows a block diagram of an MF100 device according to a general configuration.
[0054] Figure 23B shows a block diagram of an MF300 implementation of the MF100 device.
[0055] Figure 23C shows a block diagram of an MF500 implementation of the MF100 device.
[0056] Figure 24a shows a block diagram of an MF140 implementation of the MF100 device.
[0057] Figure 24B shows a block diagram of an MF150 implementation of the MF140 device.
[0058] Figure 25A shows a block diagram of an A100 device according to a general configuration.
[0059] Figure 25B shows a block diagram of an A300 implementation of the A100 device.
[0060] Figure 25C shows a block diagram of an A500 implementation of the A100 device.
[0061] Figure 25D shows a block diagram of a wireless device 1102.
[0062] Figure 26 shows front, rear and side views of an H100 device. DETAILED DESCRIPTION
[0063] It may be desirable to improve the robustness of a fixed bit rate scheme for information loss during transmission. Systems, methods and devices, as described here, can be applied to adaptive redundant coding of critical frames of an audio signal. Such adaptive encoding can include testing a plurality of shared rates (for example, shared bit allocations) and frame deviations. Such adaptive coding can also include determining that a frame is a critical frame.
[0064] Unless expressly limited by context, the term "signal" is used here to indicate any of its common meanings, including a state of a memory location (or set of memory locations), as expressed on a cable , bus, or other means of transmission. Unless expressly limited by its context, the term "generation" is used here to indicate any of its common meanings, such as computing or otherwise production. Unless expressly limited by its context, the term "calculation" is used here to indicate any of its common meanings, such as computation, evaluation, smoothing, and / or selection from a plurality of values. Unless expressly limited by its context, the term "obtaining" is used to indicate any of its common meanings, such as calculation, derivation, reception (eg, from an external device), and / or retrieval (eg example, from an array of storage elements). Unless expressly limited by its context, the term "selection" is used to indicate any of its common meanings, such as the identification, indication, application, and / or use of at least one and less than all, of one set of two or more. Where the term "comprising" is used in the present description and claims, it does not exclude other elements or operations. The term "based on" (as in "A is based on B") is used to indicate any of its common meanings, including cases (i) "derived from" (for example, "B is a precursor to A "), (ii)" based on at least "(for example," A is based on at least B ") and, if applicable in the particular context, (iii)" equals "(for example," A is equal to B "). Likewise, the term "in response to" is used to indicate any of its common meanings, including "in response to at least". Unless otherwise indicated, the terms "at least one from A, B, and C" and "one or more from A, B, and C" indicate "A and / or B and / or C."
[0065] Unless otherwise specified, the term "series" is used to indicate a sequence of two or more items. The term "logarithm" is used to indicate the base ten logarithm, although the extensions of such an operation from other bases are within the scope of this disclosure. The term "frequency component" is used to indicate one among a set of frequencies or frequency bands of a signal, such as a sample of a frequency domain representation of the signal (for example, as produced by a rapid transformation Fourier or MDCT) or a sub-band of the signal (for example, a Bark scale or sub-band of a mobile scale).
[0066] Unless otherwise specified, any disclosure of an operation of a device that has a particular characteristic is also expressly intended to disclose a method with a similar function (and vice versa), and any disclosure of an operation of a device according to a certain configuration it is also expressly intended to disclose a method according to a similar configuration (and vice versa). The term "configuration" can be used in reference to a method, apparatus and / or system, as indicated by the particular context. The terms "method", "process", "procedure" and "technique" are used generally and interchangeably, unless otherwise specified in the particular context. A "task" having several subtasks is also a method. The terms "apparatus" and "device" are also used generally and interchangeably, unless otherwise specified in the particular context. The terms "element" and "module" are typically used to indicate part of a larger configuration. Unless expressly limited by its context, the term "system" is used here to indicate any of its common meanings, including "a group of elements that interact to serve a common purpose". The term "plurality" means "two or more". Any embedding by reference of a part of a document must also be understood as incorporating definitions of terms or variables that are referenced within the part, as long as those definitions appear in other parts of the document, as well as any figures referenced in the incorporated part.
[0067] The terms "encoder", "codec" and "encoding system" are used interchangeably to designate a system that includes at least one encoder configured to receive and encode frames of an audio signal (possibly after one or more operations pre-processing, such as perceptual weighting and / or other filtering operation) and a corresponding decoder configured to produce decoded representations of the frames. Such an encoder and decoder are usually deployed at the opposite ends of a communication link. In order to support full duplex communication, cases from both the encoder and the decoder are usually deployed at either end of that link.
[0068] Unless otherwise indicated, the terms "codec", "vocoder", "audio encoder," and "speech encoder" refer to the combination of an audio encoder and a corresponding audio decoder. Unless otherwise indicated, the term "encoding" indicates the transfer of an audio signal via a codec, including subsequent encoding and decoding. Unless otherwise indicated, the term "transmission" indicates propagation (for example, a signal) to a transmission channel.
[0069] An encoding scheme as described here can be applied to encode the entire audio signal (for example, including non-vocal audio). Alternatively, it may be desirable to use such a coding scheme for speech only. In such a case, the encoding scheme can be used with a rating system to determine the content type of each frame of the audio signal and select an appropriate encoding scheme.
[0070] An encoding scheme, as described here, can be used as a main codec or as a layer or stage in a multilayer or multistage codec. In one example, such an encoding scheme is used to encode a portion of the frequency content of an audio signal (for example, a low band or a high band), and another encoding scheme is used to encode another part of the frequency signal content. In another example of this, such an encoding scheme is used to encode an audio signal that is a residue (that is, an error between the original and encoded signals) of another encoding layer, such as a residue from a linear prediction coding analysis (LPC).
[0071] The methods, systems and devices, as described here, can be configured to process the audio signal as a series of segments. Typical segment lengths range from about five or ten milliseconds to about forty or fifty milliseconds, and segments can be overlapped (for example, with adjacent segments that overlap by 25% or 50%) or without overlap. In a particular example, the audio signal is divided into a series of non-overlapping segments, or "frames", each ten milliseconds long. In another particular example, each frame is twenty milliseconds long. Examples of sample rates for the audio signal include (without limitation) eight, twelve, sixteen, 32, 44.1, 48 and 192 kilohertz.
[0072] Audio telecommunication applications can be implemented on a packet switched network. For example, audio telecommunications applications can be implemented on a Voice over Internet Protocol (VoIP) network. A packet can include one or more frames of the encoded audio signal, and packets with audio information (for example, voice) can be transmitted from a first device to a second device on the network. However, some of the packets may be lost during packet transmission. For example, the loss of multiple packets (sometimes referred to as packet loss in bursts) can be a reason for the degradation of the perceived voice quality in a receiving device.
[0073] In order to alleviate the degradation of the quality of the perceived voice caused by the loss of packets in a VoIP network, there are two types of solutions. The first solution is a receiver-based packet loss (PLC) hiding approach. A PLC method can be used to mask the effects of packet loss in VoIP communications. For example, a PLC method can be implemented to create a packet instead of replacing what was lost during transmission. Such a PLC method can try to create a package as similar as possible to what was lost. Receiver-based PLC methods may not need additional resources or help from the sender to create the replacement package. When important speech frames are lost, however, a PLC method can be ineffective in masking effects of packet loss.
[0074] The second solution is a flexible sender-based packet loss approach. This approach includes error correction methods (FEC), which may include sending some additional data with each packet. Additional data can be used to restore errors caused by data loss during transmission. For example, FEC schemes can transmit redundant audio frames. In other words, more than one copy (typically two) of an audio frame is transmitted by the sender. These two tables can be referred to as a primary copy and a redundant copy.
[0075] Although flexible packet loss schemes based on sender can improve the quality of perception of decoded speech, these schemes can also increase the bandwidth used during speech transmission. Traditional FEC schemes can also increase end-to-end delay, which can be intolerable for real-time conversations. For example, conventional sender-based schemes send the same speech frame twice over two different time periods. This approach can at least double the data rate. Some conventional schemes may use a low bit rate encoder for the redundant copy, in order to reduce the data rate. However, using a low bit rate codec can add complexity to the encoder. In addition, some conventional systems may use the same low bit rate codec for the primary copy of the frame and the redundant copy of the frame. While this approach can reduce complexity in the encoder, as well as reduce the data rate, the baseline voice quality (that is, speech quality, when no frames are lost) can be greatly reduced. In addition, conventional sender-based schemes typically operate under the assumption of an additional delay of at least one frame interval.
[0076] The systems, methods and devices, as described here, can be implemented to provide a source-controlled (and possibly channel-controlled) FEC scheme in order to achieve an optimal trade-off between speech quality, delay and data rate . The FEC scheme can be configured in such a way that any additional delay is introduced. High quality improvement of speech quality under moderate data rate increases can be achieved. An FEC scheme as described can also operate at any target data rate. In one example, the FEC scheme and target data rate can be adjusted adaptively based on the condition of a transmission channel, as well as external controls. The proposed FEC scheme can also be applied to be compatible with communication devices (for example, legacy telephone sets).
[0077] For some codecs for audio communications (for example, voice), the total number of bits used to encode each frame is a predetermined constant. Examples of such codecs include the Adaptive Multi-Rate (AMR) speech codec (for example, as described in Technical Specification 3GPP (TS) 26.071, version 11.0.0, September 2012, available from the European Telecommunications Standards Institute ( ETSI), www-dot-etsi-dot-org, Sophia Antipolis, FR) and the Broadband speech codec AMR (for example, as described in ITU-T Recommendation G.722.2, July 2003, of the International Union of Telecommunications, www-dot-itu-dot-int, and / or 3GPP Technical Specification 26.190 v11.0.0 (Sep, 2012), available from ETSI), in which the number of bits is determined by the encoding mode selected for the painting. In such cases, the transmission of a redundant copy of a previous frame may require a corresponding reduction in the number of bits available for encoding the signal information in the current frame. This reduction can have a negative impact on the quality of perception of decoded speech.
[0078] It may be desirable to apply a flexible approach, in which redundant copies are transmitted only in critical frames. A "critical frame" is a frame, the loss of which is expected to have a significant impact on the perceived quality of the decoded signal. Furthermore, it may be desirable to transmit such a redundant copy only if the piggyback impact of the redundant copy on the present table is expected to be minimal. For a fixed bit rate system, it may be desirable to determine a number of bits to be used for encoding the current frame, so that the total number of bits used to encode the current frame and the number of bits used to encode a redundant copy (for example, a partial copy) of the past frame meets a fixed target bit rate T.
[0079] Figure 1 is a block diagram illustrating an example of a transmission terminal 102 and a receiving terminal 104, which communicate over an NW10 network through transmission channel TC10. Each of the terminals 102 and 104 can be implemented to carry out the method as described herein and / or include an apparatus, as described herein. The transmit and receive terminals 102, 104 can be any devices that are capable of supporting voice communications, including telephones (e.g., smartphones), computers, audio reception and broadcast equipment, videoconferencing equipment, or the like. The transmit and receive terminals 102, 104 can be implemented, for example, with wireless multiple access technology, such as Code Division Multiple Access (CDMA) capability. CDMA is a modulation and multiple access scheme based on spread spectrum communications.
[0080] The transmit terminal 102 includes an AE10 audio encoder, and receive terminal 104 includes an AD10 audio decoder. AE10 audio encoder can be used to compress audio information (for example, speech) from a first U110 user interface (for example, a microphone and audio front end), extracting parameter values according to a model of human speech generation. A CE10 channel encoder gathers the parameter values in packets, and a TX10 transmitter transmits the packets including these parameter values in the NW10 network, which can include a packet-based network, such as the Internet or a corporate intranet, via TC10 transmission. Transmission channel TC10 can be a wired and / or wireless transmission channel and can be considered to extend to an entry point of the NW10 network (for example, a base station controller), the other entity within the NW10 network ( for example, a channel quality analyzer), and / or to an RX10 receiver of the receiving terminal 104, depending on how and where the quality of the channel is determined.
[0081] An RX10 receiver of the receiving terminal 104 is used to receive network packets NW10 through a transmission channel. A CD10 channel decoder decodes the packets to obtain the parameter values, and an AD10 audio decoder synthesizes the audio information from the parameter values from the packets. The synthesized audio (for example, speech) is provided for a second UI20 user interface (for example, an audio output stage and a speaker) over the receiving terminal 104. Although not shown, several audio processing functions signal can be performed on the CE10 channel encoder and CD10 channel decoder (for example, convolutional encoding, including cyclic redundancy check (CRC) functions, interleaving) and on TX10 transmitter and RX10 receiver (for example, digital modulation and demodulation corresponding, spread spectrum processing, analog-to-digital and digital-to-analog conversion).
[0082] Figure 2 shows an example of an NW20 network NW10 implementation that includes BTS1-BTS3 base transceiver stations, which communicate with mobile stations through uplink and downlink radio transmission channels. The NW20 network also includes the core network CNW1, which is connected to the public switched telephone network PSTN and Internet INT, and the core network CNW2, which is also connected to the Internet INT. The NW20 network also includes the BSC1-BSC3 base station controllers that interface the transceiver stations with the core networks. The NW20 network can be implemented to provide packet-switched communications between terminal devices. CNW1 core network can also provide circuit switched communications between terminal devices MS1 and MS2 via base transceiver stations BTS1, BTS2 and / or between such a terminal device and a terminal device on the PSTN ..
[0083] Figure 2 also shows examples of different terminal devices that can communicate with each other (for example, through a packet-switched communication link) via the NW20 network: mobile stations MS1-MS3; Voice over IP (VoIP) VP telephone; and CP computer, which is configured to run a telecommunications program (for example, Skype software, from Microsoft Skype Division, LU). Any of the terminal devices MS1-MS3, VP and CP can be implemented to include a transmitting terminal instance 102 and a receiving terminal instance 104. Mobile devices MS1-MS3 communicate with the network via uplink transmission radio channels and wireless downlink. VP and CP terminals communicate with the network via wired transmission channels (for example, Ethernet cable) and / or wireless transmission channels (for example, a "WiFi" or IEEE 802.11 link). The NW20 network can also include intermediate entities, such as gateways and / or TRAUs (Transcoder and Rate Adapter Units).
[0084] Each part for a communication can transmit as well as receive, and each terminal can include cases of audio encoder AE10 and decoder AD10. The audio encoder and decoder can be separate devices or integrated into a single device known as a "voice encoder" or "vocoder". As shown in figure 1A, terminals 102, 104 are described with an audio encoder AE10 on one network terminal NW10 and an audio decoder AD10 on the other.
[0085] In at least one transmission terminal configuration 102, an audio signal (for example, speech) can be introduced from the first UI10 user interface for AE10 audio encoder in frames, with each frame further divided into subframes. Such arbitrary frame limits can be used when some block processing is performed. However, such partitioning of audio samples into frames (and subframes) can be omitted if continuous processing instead of block processing is performed. In the examples described, each packet transmitted over the NW10 network can include one or more frames according to the specific implementation and the general restrictions of the project.
[0086] AE10 audio encoder can be a variable rate or single rate encoder. A variable rate encoder can dynamically switch between various encoder modes (for example, the different fixed rates) from frame to frame, depending on the audio content (for example, depending on whether speech is present and / or what type of speech is gift). The AD10 audio decoder can also dynamically switch between the corresponding decoder modes from frame to frame, correspondingly. A particular mode can be chosen for each frame to achieve the lowest available bit rate, while maintaining acceptable signal reproduction quality at the receiving terminal 104.
[0087] AE10 audio encoder normally processes the input signal as a series of segments that do not overlap in time or "frame", with a new encoded frame being calculated for each frame. The frame period is generally a period during which the signal is expected to be locally fixed; common examples include twenty milliseconds (equivalent to 320 samples at a sampling rate of 16 kHz, 256 samples at a sampling rate of 12.8 kHz or 160 samples at a sampling rate of eight kHz) and ten milliseconds. It is also possible to apply AE10 audio encoder to process the input signal as a series of overlapping frames.
[0088] Figure 1B shows a block diagram of an AE20 implementation of AE10 audio encoder that includes an FE10 frame encoder. Frame encoder FE10 is configured to encode each of a sequence of CF frames from the input signal to produce a corresponding frame of a sequence of EF encoded audio frames. AE10 audio encoder can also be implemented to perform additional tasks, such as splitting the input signal into frames and selecting an encoding mode for the FE10 frame encoder. Selecting an encoding mode (for example, rate control) may include performing voice activity detection (VAD) and / or otherwise classifying the frame's audio content. In this example, audio encoder AE20, also includes a VAD10 voice activity detector that is configured to process FC core audio frames to produce a VS voice activity detection signal (for example, as described in 3GPP TS 26.194 v11 .0.0, September 2012, available on ETSI).
[0089] Frame encoder FE10 is typically implemented according to a source filter model that encodes each frame of the audio input signal, as (A) a set of parameters describing a filter and (B) an excitation signal that it will be used in the decoder to conduct the described filter to produce a synthesized reproduction of the audio frame. The spectral envelope of the speech signal is typically characterized by peaks that represent the resonances of the vocal tract (for example, the mouth and throat) and are called formants. Most speech encoders code at least this coarse spectral structure as a set of parameters, such as the filter coefficients. The residual signal can be modeled as a source (for example, as produced by the vocal cords) that drives the filter to produce the voice signal and is typically characterized by its intensity and tone.
[0090] Figure 3 shows a block diagram of a basic FE20 implementation of FE10 frame encoder, which includes a PP10 preprocessing module, a linear prediction encoding (LPC) search module, LA10, a search module open loop pitch OL10, an adaptive code book search module (ACB) AS10, a fixed code book search module (FCB) FS10, and a GV10 gain vector quantization module (VQ). PP10 pre-processing module can be implemented, for example, as described in section 5.1 of 3GPP TS 26.190 v11.0.0. In one example, the PP10 preprocessing module is implemented to perform downward sampling of the core audio frame (for example, from 16 kHz to 12.8 kHz), high pass filtering of downwardly sampled frame (for example, with a frequency 50 Hz cutoff), and pre-emphasis of the filtered frame (for example, using a first order high pass filter).
[0091] Linear prediction encoding (LPC) analysis module LA10 encodes each core audio frame's spectral envelope as a set of linear prediction (LP) coefficients (for example, the coefficients of a filter with all poles 1 / A (z)). In one example, the LPC analysis module LA10 is configured to calculate a set of sixteen LP filter coefficients to characterize the forging structure of each 20 millisecond frame. Analysis module LA10 can be implemented, for example, as described in section 5.2 of 3GPP TS 26.190 v10.0.0.
[0092] Analysis module LA10 can be configured to analyze each frame sample directly, or the samples can be weighted first according to a window function (for example, a Hamming window). The analysis can also be performed through a window that is larger than the frame, such as a 30 ms window. This window can be symmetrical (for example, 5-20-5, such that it includes the 5 milliseconds immediately before and after the 20 millisecond frame) or asymmetrical (for example, 10-20, such that it includes the last 10 milliseconds of the preceding frame). An LPC analysis module is typically configured to calculate the LP filter coefficients using a Levinson-Durbin or Leroux-Gueguen recursive algorithm. Although LPC encoding is well suited for speech, it can also be used to encode generic audio signals (for example, including non-speech, such as music). In another implementation, the search module can be configured to calculate a set of cepstral coefficients for each frame instead of a set of LP filter coefficients.
[0093] Linear prediction filter coefficients are usually difficult to quantify efficiently and are usually mapped to another representation, such as spectral line pairs (LSPs) or spectral line frequencies (LSFs), or imitation spectral pairs (ISPs) or imitation spectral frequencies (ISFs), for entropy quantification and / or coding. In one example, the LA10 analysis module transforms the LP filter coefficient set into a corresponding ISF set. Other one-to-one representations of LP filter coefficients include plot coefficients and log-area ratio values. Usually a transform between a set of LP filter coefficients and a corresponding set of LSFs, PEL, ISF, or ISPs is reversible, but modalities also include implementations of the LA10 analysis module, in which the transform is not reversible without error.
[0094] Analysis module LA10 is configured to quantify the set of ISF (or LSFs or other coefficient representation) and FE20 frame encoder is configured to output the result of this quantization as an LPC XL index. Such a quantizer typically includes a vector quantizer that encodes the input vector, as an index to a corresponding vector input in a table or codebook.
[0095] Frame encoder FE20 also includes an optional open loop pitch search module OL10 that can be used to simplify pitch analysis and reduce the scope of closed loop pitch search in AS10 adaptive code book search module . OL10 module can be implemented to filter the input signal through a weighting filter that is based on the non-quantized LP filter coefficients, decimate the weighted signal by two, and produce a pitch estimate, once or twice per frame (depending on the current rate). OL10 module can be implemented, for example, as described in section 5.4 of 3GPP TS 26.190 v10.0.0.
[0096] The adaptive code book search module (ACB) AS10 is configured to search the adaptive code book (based on past excitation and also called "pitch code book") to produce the delay and gain of the field filter. AS10 module can be implemented to perform closed loop pitch search around open loop pitch estimates on a subframe basis on a target signal (obtained, for example, by filtering residual LP through a synthesis filter weighted based on quantized and non-quantized LP filter coefficients) and then calculate the adaptive code vector by interpolating the excitation passed in the indicated fractional pitch delay and calculating the ACB gain. AS10 module can also be implemented to use the residual LP to extend the past excitation store to simplify the closed loop pitch search (especially for delays smaller than the 64 sample subframe size). AS10 module can be implemented to produce an ACB gain (for example, for each subframe) and a quantization index that indicates the pitch delay of the first subframe (or the pitch delays of the first and third subframes, depending on the current rate) and relative pitch delays from other subframes. AS10 module can be implemented, for example, as described in section 5.7 of 3GPP TS 26.190 v10.0.0.
[0097] The FS10 fixed code book (FCB) search module is configured to produce an index that indicates a fixed code book vector (also called "innovation code book", "innovative code book", "stochastic code book" ", or" algebraic code book "), which represents the portion of the excitation that is not modeled by the adaptive code vector. FS10 module can be implemented to produce the codebook index as a codeword that contains all the information necessary to reproduce the FCB vector (for example, it represents the pulse positions and signals), in such a way that no codebook is necessary. FS10 module can be implemented, for example, as described in section 5.8 of 3GPP TS 26.190 v10.0.0.
[0098] GV10 gain vector quantization module is configured to quantify FCB and ACB gains, which can include the gains for each subframe. GV10 module can be implemented, for example, as described in section 5.9 of 3GPP TS 26.190 v10.0.0
[0099] As an alternative to the codebook based approach, a transform based approach can be used to encode the residual signal LPC. For example, a modified discrete cosine transform (MDCT) can be used to encode the residue in parameters that include a set of MDCT coefficients, such as the Calliope super broadband codec (QUALCOMM Inc., San Diego, CA) and the option of the AMR-WB + codec. In another example, a transform approach is used to encode an audio signal, without implementing LPC analysis.
[00100] Figure 5A shows a flow chart for an M100 method of audio signal processing according to a general configuration, which includes tasks T200 and T300. Task T200 calculates at least one value of a decision metric for a second frame of the audio signal (the "subsequent frame" or "bearer frame"), which is subsequent in the audio signal to a first frame (for example, a frame critical) of the audio signal. Based on at least one calculated value of the decision metric, task T300 selects one from a plurality of relocation candidates, where the selected relocation candidate indicates a reallocation of an initial bit allocation T to the subsequent frame within a first frame portion and a second portion. In a typical implementation, the first portion of the initial bit allocation T is then used to bear a copy of the subsequent frame, and the second portion of the initial bit allocation is used to bear a redundant copy of the critical frame.
[00101] It may be desirable to reduce the probability that the carrier frame will also be a critical frame (that is, critical for another frame that is subsequent to it). Usually this probability is higher for the picture that immediately follows the critical picture and then decreases rapidly for subsequent ones. For express speech, it is typical that the initial condition in a speech outbreak is critical and that the condition that immediately follows is also critical (for example, to cover the case where the initial condition is lost). However, it is also possible for another frame in a speech flare to be critical (for example, in a case where the loss of pitch deviates).
[00102] A k frame offset can be used to indicate the distance between the critical frame and the carrier frame. In one example, the frame deviation value k is the difference in frame number between critical frame n and the carrier frame (n + k) (for example, one more than the number of intervention frames). Figure 6A shows a typical example, in which the value of k is three. In another example, the value of k is four. Other possible values include one, two, three, five, and whole numbers greater than five.
[00103] The M100 method can be implemented so that the deviation k is fixed (for example, during the implementation of the system or during the establishment of a call). The value of k can be chosen according to the length of a frame (for example, in milliseconds) in the original time domain signal and a maximum allowed delay. For example, the value of k can be restricted by a maximum allowable value (for example, to limit frame delay). It may be desirable for the maximum delay allowed to be around eighty or one hundred milliseconds. In this case, k can have a maximum value of four or five for a scheme using twenty millisecond frames, or a maximum value of eight, nine or ten for a scheme using ten millisecond frames.
[00104] The deviation value of k can also be selected and / or updated during a call according to the channel conditions (for example, as indicated by the return from a receiver). For example, it may be desirable to use a higher value of k, in an environment that is causing a frequent loss of consecutive frames (for example, due to degradations in length).
[00105] Receiving terminal 104 can also feed channel status information 120 back to transmission terminal 102. In one example, receiving terminal 104 is configured to collect information related to the quality of the transmission channel carrying the transmission terminal packets 102. The receiving terminal 104 can use the collected information to estimate the quality of the channel. The collected information and / or the channel quality estimate can then be fed back to the transmission terminal 102 as channel status information.
[00106] Figure 4 is a block diagram illustrating an example of an implementation 112 of the transmission terminal 102 and an implementation 114 of the receiving terminal 104, which communicate through the NW10 network through transmission channels TC10 and RC10 . In this example, receiving terminal 114 includes a CE20 channel encoder instance CE20 that can collect the collected information and / or quality estimate (for example, from AD10 audio decoder) in a packet for transmission, via an instance TX20 of TX10 transmitter and RC10 channel transmission, back to transmission terminal 112, where the packet is received by an RX10 instance of RX10 receiver and dismounted by a CD20 instance of 10CD channel decoder, and the information and / or estimate is provided for AE10 audio encoder. The transmission terminal 112 (for example, audio encoder AE10) can use this channel status information to adapt one or more functions (for example, a deviation and / or a criticality limit), which are associated with a resistant scheme based packet loss, as described here.
[00107] The deviation k indicates the length of an interval between the transmission time of the primary copy of a frame and the transmission time of the redundant copy of the frame. Typically, packet losses on a packet switched network are bursty, and burst lengths can be different under different network conditions. Thus, using a dynamically adjusted deviation can result in better error protection performance. An ideal deviation can be estimated using the channel status information sent by the receiver. For example, the deviation value can be adjusted adaptively (for example, at run time) based on the condition of the channel. Alternatively, the deviation value can be predetermined.
[00108] In one example, task T200 calculates an open loop decision metric D which is based on information from the board. Figure 5B shows a flow chart for an M200 implementation of the M100 method, which includes such a T210 implementation of the T200 metrics calculation task. Task T210 can be implemented to calculate the open loop metric D, for example, a compressibility measure of the subsequent frame. This measure can be calculated as a correlation of subframes from the subsequent frame to the other (for example, the maximum correlation over all possible delay values and all pairs (or all adjacent pairs) of subframes, or as an average of the maximum correlation over all possible delay values for each pair (or for each adjacent pair) of subframes). This measure can be considered a static measure of compressibility of the frame. An example of metric D is a RijP measure of a correlation at delay p between two subframes vi and vj of length S, which can be calculated using an expression like

[00109] In one example, a twenty millisecond frame is divided into three subframes of length 53, 53, and 54 of samples, respectively. In another such example, a twenty millisecond frame is divided into four five millisecond subframes. Metric D can be selected in such a way that, for example, a high D value indicates a compressible frame and a low D value indicates a frame that is resistant to compression.
[00110] Task T300 selects one among a plurality of relocation candidates, based on at least one calculated value of the decision metric. Figure 5C shows a flow chart for an M210 implementation of the M200 method. Method M210 includes an implementation of task T300 as a loop that includes a comparison task T310 and is configured to loop through a set of limit values V1 to VM. Task T310 compares the value of D with a current one from a set of limit values. In this non-limiting example, the set of limit values is ordered so that Vq <Vq + 1 for all integers from 1 to q (-1l), and the loop is configured to start at the value VM. In one example, the value of M is three, although other possible values include two, four, five, and integers greater than five.
[00111] From a value of one for the reallocation index m, the mesh shown in figure 5C selects the value of m so that the value of D is not less than (alternatively, it is greater than) the value limit Vm. In a typical implementation, a copy of the subsequent frame and a redundant copy of the critical frame are encoded in the initial bit allocation T according to the reallocation candidate, as indicated by the selected value of index m.
[00112] Each of the relocation candidates indicates a distribution of an initial bit allocation between at least the subsequent frame and the critical frame. For example, each Nm distribution can indicate a division of the initial bit allocation T into an allocation of Nm bits for the critical frame and an allocation of (T-Nm) bits for the subsequent frame. In other cases, it is possible that a distribution indicates an allocation of part of the total allocation of T bits to encode another frame and / or other information as well.
[00113] Figure 6B shows the ranges of the decision metric value D, as defined by the limit values V1 to VM, and a correspondence between each of these ranges and a different one from a plurality of distributions of the initial bit allocation T between a first portion (carrier) and a second portion (redundant). In this example, each distribution is defined by an N1 and NM number, which can indicate a number of bits in the second portion, or a bit rate of a frame to be encoded in the second portion (this example also includes a distribution of return N0 as discussed below). Metric D can be selected in such a way that, for example, a high D value indicates a compressible frame and a low D value indicates a frame that is resistant to compression. In this case, for a value of D, which indicates a compressible frame (ie, a high value of D), a low rate (ie, a small redundant portion) may be sufficient. For a value of D, which indicates a frame that is resistant to compression (i.e., a low value of D in this example), a higher rate (i.e., a larger redundant portion) may be desired. In the non-limiting example of the M210 method, the set of relocation candidates is ordered so that Np <Np + 1 for all integers from 1 to p (M1).
[00114] It is expressly observed that for the most critical frames, the mesh in the M210 method can interact less than M times. It may be desirable, for example, to implement the M200 method (for example, to select an appropriate decision metric and set of limit values) in such a way that for most critical frameworks, it will not be necessary for the method to execute each value limit in the set before a satisfactory relocation candidate is identified.
[00115] If task T310 fails for all limit values in the set, the M210 method can determine that a redundant copy of the critical frame cannot be transmitted. Alternatively, the M210 method can be implemented to include a return case, as shown in figure 6B. Figure 6C shows a flowchart for an M220 implementation of the M200 method that has a mesh, including an alternative T320 implementation of selecting task T300, which is configured to start with a value of M for the relocation index m. M220 method can also be implemented to include a return case, as shown in figure 6B.
[00116] It should be understood that the particular mesh structures shown in Figures 5C and 6C, and the particular correspondence between intervals of the metric D value and initial bit allocation reallocations, are just examples, and that any appropriate selection mesh and any appropriate correspondence between the elements of an ordered set of decision metrics limit values V1 to VM and the corresponding elements of an ordered set of redundant portion reallocations N1 to NM can be used. It should also be noted that the open-loop example of decision metric D, as described above is just an example, and that the disclosed principles of combining a decision metric with reallocations can be applied to any decision metric (for example , in open or closed loop), which measures the effect on the quality of perception of reducing the bit rate of a carrier frame to accommodate a redundant copy of a critical frame.
[00117] It may be desirable to implement the M100 method to select frame (n + k) (for example, when selecting a value for deviation k) based on information from one or more frames that are subsequent to the critical frame. In such a case, it may be desirable to select frame (n + k), to minimize the impact on perceptual quality if the critical frame is not lost in the channel. For example, it may be desirable to select the next most compressible frame as frame (n + k), subject to a maximum K delay pressure. Figure 7A shows a flow chart for an M300 implementation of the M100 method that includes a T220 implementation of task metric calculation T200. Task T220 calculates at least one decision metric value for each of a plurality of frames that are subsequent in the audio signal for the critical frame. Method M300 also includes a T350 implementation of task T300 that selects one from a plurality of relocation candidates and one from the plurality of subsequent frames (for example, by selecting the corresponding k-deviation value).
[00118] Figure 7B shows a flow chart for an M310 implementation of the M300 method. Method M310 includes an implementation of task T220 as a mesh that includes a calculation task T230. Task T230 calculates a value of D decision metric D as described here for the table indicated by the current value of deviation k. Method M310 also includes an implementation of task T350 as a loop that includes a comparison task T330 and is configured to perform a loop through a set of limit values V1 to VM in reverse order. In this non-limiting example, the set of limit values is ordered so that Vq <Vq + 1 for all integers from 1 aq (M-1), the mesh that includes task T230 is configured to start at the value of k = 1, and the mesh that includes task T330 is configured to start with the value VM. If task T330 fails for all limit values in the set, method M310 can determine that a redundant copy of the critical frame cannot be transmitted for deviation k. If task T330 fails for all limit values in the set and for all k values, method M310 can determine that a redundant copy of the critical frame cannot be transmitted. Alternatively, the M310 method can be implemented to include a standard deviation value k (for example, three or four) as a reserve.
[00119] It is expressly indicated that for the most critical frames, task T330 can compare a value of the decision metric to the values of M of the set of limit values for each less than K frames. It may be desirable, for example, to implement the M300 method (for example, to select an appropriate decision metric and a set of limit values) in such a way that for the most critical frames, it will not be necessary for the method to execute each of the K subsequent frames before a satisfactory frame and relocation candidate are identified. It is also possible to apply the M300 method (for example, M310) such that M is equal to one (for example, only one limit) and K is greater than one (for example, several possible deviations).
[00120] It is contemplated that the same sets of limit values and M relocation candidates will be used for all frames in the K plurality of subsequent frames, but it is also possible to use a different set of limit values and / or relocation candidates for different subsequent tables (for example, according to a speech mode and / or other characteristics of the subsequent table), and in such a case it is possible for each set of relocation candidates to have a respective number of different M elements.
[00121] In another example, task T200 is implemented to calculate a set of values for a closed-loop decision metric. In this example, each calculated value is based on the dynamic compressibility measure, such as a measure that is based on information from the respective coded versions of the subsequent table. Figure 8A shows a flowchart for an M400 implementation of the M100 method, which includes such a T250 implementation of the T200 metric calculation task. Task T250 can be implemented to calculate the decision metric based on, for example, a measure of perceptual quality. Such a metric can be calculated, for each relocation candidate, as a measure of an associated variation (for example, reduction) in the perceptual quality of the carrier. For example, such a metric can be calculated as the difference (for example, an absolute difference) or a ratio between (A) a measure of perceptual quality of the bearer frame as encoded using the entire initial bit allocation T and (B) a measure of perceptual quality of the carrier frame as encoded using only the carrier portion of the initial bit allocation.
[00122] Figure 8B shows a flow chart for an M410 implementation of the M400 method. Method M410 includes an implementation of task T250 as a mesh that has a calculation subtask T260 and is configured to perform a mesh through a set of reallocation indices 1 to M. Task T260 calculates a Dm value of the decision metric for the table (n + k) and the reallocation candidate indicated by the current index value. In this example, Dm = | Q (t) Q - (T - Nm) |, where Q (x) is a measure of perceptual quality of the frame (n + k) as encoded using x bits. Such an example of measurement Dm can also be considered as a quality cost of the distribution Nm for the frame (n + k), and other such quality costs (for example, in relation to the quality of the structure (n + k) as encoded using T bits) can also be used.
[00123] The compressibility measure Q (x) can be a total reference metric, a non-reference metric, or a reduced reference metric. Examples of Q (x) measurement include perceptually weighted distortion measurements (for example, enhanced modified Bark spectral distance or EMBSD; measurement normalization blocks or MNB algorithm, as described for example, in ITU-T Recommendation P.861); emission of the error word rate of a speech recognizer (for example, applying hidden Markov models) on the original and decoded signals; and a version of the E-model (for example, as described in ITU-T Recommendations G.107 and G.108), which produces an R value that can be mapped to an estimated average opinion score (MOS). Other examples of indicators (for example, objective metrics) that can be used for Q (x) include signal-to-noise ratio (SNR), perceptually weighted SNR (for example, weighted using frame LP coefficients (n + k)), SNR segmented, perceptually weighted segmented SNR, cepstral distance and Bark spectral distance. Other examples of objective metrics that can be used for Q (x) include a measure of perceptual speech quality (PSQM) (for example, as described in Recommendation ITU-T P.861), a noise estimator as produced by such a measure, and other metrics, as described in, for example, ITU-T recommendations P.861 and P.862 (for example, PSQM and PSQM +; perceptual assessment of speech quality, PESQ). In another example, the decision metric Dm is calculated as an SNR, or perceptually weighted SNR, where the signal amount is based on the frame energy (n + k) as decoded from a version encoded using T bits, and the amount of noise is based on the energy of a difference between the amount of signal and frame (n + k) as decoded from a version encoded using (T-Nm) bits.
[00124] Method M410 also includes an implementation of task T300 as a mesh that has a comparison subtask T340 and is configured to mesh through the calculated set of decision metric values D1 to DM. Task T340 compares a limit value Z with a current value for an element of the set of decision metric values. In this non-limiting example, the set of values for the decision metric is ordered so that Dp <Dp + 1 for all integers p from 1 to (M-1). In one example, the value of M is three, although other possible values include two, four, five, and integers greater than five.
[00125] From a value of one for the relocation index m, the closed loop that includes task T340 selects the first value of m so that the value of Dm is not greater than (alternatively, it is less than ) the limit value Z. The M400 method can be implemented to apply such a selection that encodes a frame copy (n + k), and a redundant copy of the critical frame according to the relocation candidate Nm. If task T340 fails to all limit values in the set, the M410 method can determine that a redundant copy of the critical frame cannot be transmitted. Alternatively, the M410 method can be implemented to include an alternative case (for example, a standard reallocation).
[00126] It is expressly indicated that for the most critical staff, task T340 can compare less than all the M values of the decision metric with the limit value Z. It may be desirable, for example, to implement the M400 method (for example, select an appropriate decision metric, the threshold value, and a set of reallocation candidates) such that for the most critical staff, it will not be necessary for the method to test each of the M values before a satisfactory relocation for this framework to be identified.
[00127] Figure 9A shows a flow chart for an alternative M420 implementation of the M400 method in which a single mesh encompasses both tasks T260 and T340. Figure 9B shows a flowchart for an M430 implementation of the M400 method that has an alternative mesh structure, which is configured to start with a value of M for relocation index m. M420 and M430 methods can also be implemented to include a return case (for example, a standard relocation). It should be understood that the particular mesh structures shown in figures 8B, 9A, and 9B are only examples, and that any suitable selection circuit can be used to implement the M400 method.
[00128] In a similar way, as discussed here with reference to the M300 method, it may be desirable to implement the M400 method to select a value for the k deviation based on information from one or more frames that are subsequent to the critical frame. In such a case, it may be desirable to determine an appropriate value for deviation k to minimize the impact on perceptual quality if the critical frame is not lost in the channel. For example, it may be desirable to select a value of k to satisfy a quality change limit Z, subject to a maximum delay constraint K.
[00129] Figure 10A shows a flowchart for such an M500 implementation of the M400 method that includes a T270 implementation of the T250 metric calculation task. Task T270 calculates a set of decision metric values for each of a plurality of frames that are subsequent in the audio signal to the critical frame. Method M500 also includes a T350 task instance that selects one from a plurality of relocation candidates and one from the plurality of subsequent frames (for example, selecting the corresponding k-deviation value).
[00130] Figure 10B shows a flow chart for an M510 implementation of the M500 method. Method M510 includes a T270 task implementation as a mesh that includes a T280 instance of the T260 calculation task. This mesh calculates a set of values from D1 to DM of the decision metric as described here for the table indicated by the current value of deviation k. This circuit also implements task T350 with an example of comparison task T340, as described here. In this non-limiting example, the loop is configured to initialize both the deviation index k and the reallocation index m with the value one.
[00131] It is contemplated that the same limit value Z and the same set of M relocation candidates will be used for all frames in the K plurality of subsequent frames, but it is also possible to use a different limit value Z and / or a set different from relocation candidates for different subsequent frames (for example, according to a speech mode and / or other characteristics of the subsequent frame), and in such a case it is possible for each set of relocation candidates to have a different number of respective M. elements
[00132] It is expressly observed that for the most critical frames, task T340 can compare each Dm value of the decision metric with the limit value Z for each smaller of K subsequent frames. It may be desirable, for example, to implement the M500 method (for example, to select an appropriate decision metric, the threshold value, and a set of relocation candidates) in such a way that for the most critical staff, it will not be necessary for the method of implementing each of the subsequent frames k before a subsequent frame and a satisfactory reallocation to that frame is identified.
[00133] Method M510 can be implemented in such a way that if task T340 fails for all candidates for relocating a frame (n + k), the frame is coded with T bits. If task T340 fails for all candidates for reallocating more than all candidate frames, the M510 method can determine that a redundant copy of the critical frame cannot be transmitted. Alternatively, the M510 method can be implemented to include standard values of deviation k (for example, three or four) and reallocation index m as an alternative. Figures 11A, 11B, and 12 show flowcharts for similar implementations M520, M530, M540 and, respectively, of method M500 which have alternating mesh structures. In another, non-limiting alternative, the mesh structure of the M510 method is reconfigured so that the inner mesh interacts through k values (for example, frames) and the outer mesh through m values (for example, candidates relocation).
[00134] Method M100 can be performed in response to a determination that frame n is a critical frame. For example, it may be desirable to perform an example of an M100 method for each frame of the audio signal that is identified as being critical (i.e., important for the quality of the decoded signal, under packet loss conditions). Figure 13A shows a flowchart for an M110 implementation of the M100 method that includes task T100, which identifies the critical framework.
[00135] Task T100 can be implemented to indicate that a signal frame is critical by calculating a value of a criticality measure for the frame and comparing the calculated value with a limit value. Such a criticality measure can be based on information within the frame and can also be based on information from one or more frames that are adjacent and / or subsequent to the frame in the input signal. Task T100 can be implemented to indicate that the frame is critical when the calculated value exceeds (alternatively, it is not less than) the limit value, which can be based on a coding mode selected for the frame. Task T100 can be implemented to perform each frame of the audio signal, or only for certain frames (for example, frames that are identified as spoken, or transient, or initial; frames that are initially assigned at least a rate of minimum bits; etc.).
[00136] The T100 task can be implemented to calculate the criticality measure based on one or more criteria ranging from the general characterization of the framework to specific loss impact assessments. This measurement can be based on information within the frame and can also be based on information from one or more frames that are adjacent and / or subsequent to the frame in the input signal.
[00137] A critical picture can be a picture that, when lost, can cause significant quality degradation. Different critical frames can have different levels of criticality. For example, for two critical frames n1 and n2, if the frame (n1 + 1) (that is, the subsequent frame for frame n1) is highly predictable from that of frame n1 (n2 + 1) (that is, the subsequent frame for frame n2) does not depend on frame n2, so frame n1 may be more critical than frame n2, because losing frame n1 may cause quality degradation by more than one frame.
[00138] Task T100 can be implemented to calculate the criticality measure based on an indication of the type of frame encoding n (that is, the encoding process to be used to encode the frame) and possibly each of a or more adjacent and / or subsequent frames to frame no. Examples of this type of coding may include code excited linear prediction (CELP), noise excited linear prediction (NELP), prototype waveform interpolation (PWI) or prototype pitch period (PPP), etc. According to this criterion, for example, a critical CELP framework can be considered to be more important than a critical NELP framework.
[00139] In addition, or alternatively, task T100 can be implemented to calculate the coding dependency estimate based on a model frame's speech mode (ie, the classification of the frame's speech content) and possibly , the dependent frame and / or each of one or more frames adjacent to the model frame. Examples of speech mode may include loud, deaf, silent and transient. The classification of "dubbed" can also be divided into start and stop. A transient classification can be further divided into transient and non-transient. According to this criterion, for example, a speech initiation frame (an initial frame of a conversation spurt) can be more critical than a stationary sound frame, as the coding of subsequent frames in the conversation spurt can depend heavily on information in the start frame. In one example, task T100 is implemented to calculate the coding dependency estimate to indicate a high degree of dependency in response to an indication that the model frame is a speech onset frame and the frame is dependent on a stationary sound frame .
[00140] In addition, or alternatively, task T100 can be configured to calculate the coding dependency estimate based on one or more of the other properties of the model frame (and possibly the dependent frame and / or each one or more frames adjacent to the model frame). For example, if the values of some important parameters for a model frame differ significantly (for example, more than a predetermined limit), from the corresponding values for the previous frame, then frame n can be a frame critical, since it cannot be easily predicted from the preceding frame, and loss of frame n can adversely affect subsequent frames, which are more similar to frame n for the previous frame.
[00141] An example of such a property is an adaptive codebook gain (ACB). A low ACB gain value for the model frame may indicate that the frame differs significantly from the preceding frame, while a high ACB gain value for the subsequent frame for frame n (for example, frame (n + 1), (n + 2), or (n + 3)) can indicate that the frame is very dependent on frame n. In one example, task T100 uses frame information n (for example, an excitation signal), to generate an adaptive code vector for the subsequent frame, and to calculate an ACB gain value for a coded version of the subsequent frame. In this example, task T100 is implemented to calculate the criticality measure based on at least the calculated ACB gain value.
[00142] Another example of such a property is a perceptually weighted SNR (signal / noise ratio), which can be expressed
where L is the length of the frame in samples, c is the perceptually weighted signal obtained by filtering the decoded version of the model n frame with a perceptual weighting filter W (z), and e is a perceptually weighted error. Error e can be calculated, for example, as a difference between a filtered decoded version of (A) to W (z) of the model frame n and (B) a hidden error version filtered by W (z) of the model frame n (ie , assuming the frame is not available in the decoder). The hidden error version can be calculated based on information from previous frames according to a frame error omission algorithm. For example, the hidden error version can be calculated according to the procedure described in 3GPP TS 26,091, v.10.0.0 (April 2011, "Error hiding lost frames", available from ETSI). In one example, W (z) = Α (z / γ) Η (z), where
al to ap are the LPC filter coefficients for Swcftq oqfgnq p. K 2.94. g J * i + 3 1 * 3-0.68z1). In an alternative example, error e is calculated by applying the filter W (z) for the difference between the decoded and hidden error versions.
[00143] In addition, or alternatively, task T100 can be configured to calculate the coding dependency estimate as an estimate of the impact of the loss of the model frame on the coding quality of one or more subsequent frames (for example, the dependent frame ). For example, the criticality measure can be based on information from a coded version of each of one or more subsequent frames for frame n (for example, the gain of adaptive codebook of frame n and / or one or more of the subsequent tables). In addition or alternatively, such a measure may be based on information from a decoded version of each of the one or more other frames subsequent to the model frame (for example, a perceptually weighted SNR of the decoded version), where the subsequent frame was encoded without using information from frame no.
[00144] An example of such a measure for a frame (n + q) in relation to frame n can be expressed as
where L is the length of the frame in samples, c is the perceptually weighted signal obtained by filtering the decoded version of the frame (n + q), with a perceptual weighting filter W (z), and e is a perceptually weighted error . Error and can be calculated, in this case, for example, as a difference between (A) a filtered decoded version of W (z) filtered from the frame (n + q), without loss of frame n and (B) a decoded version of W (z) frame filter (n + q) assuming a hidden error version of frame n. The filter W (z) can be calculated as described above, using the LPC filter coefficients for the table (n + q). In an alternative example, error e is calculated by applying the filter W (z) for the difference between the loss versions assuming decoded and normally decoded from the frame (n + q).
[00145] The T100 task can be implemented to indicate only one active voice frame as a critical frame. Alternatively, task T100 can be implemented to consider frames of non-speech as potentially critical frames. Normally, in two-way conversations, each party speaks for some time during which a communication system transmits the speech of a party (for example, less than half the time) and for other times during which the system of communication transmits silence or background noise. Infrequent transmission or discontinuous transmission (DTX), during the silence period (or background noise) has little impact on the quality of conversation perception, but offers the benefits of reducing intra- / inter-cell interference (therefore potentially increasing capacity system) and conserving the battery power of a mobile unit for conversation.
[00146] A typical DTX scheme is performed by a voice encoder that uses voice activity detection (VAD). Using VAD, the encoder can distinguish active speech from background noise. In one example, AE10 audio encoder (for example, AE20) is implemented to encode each active speech segment (usually twenty milliseconds in length) with a target bit rate packet for transmission and represents background noise segments (also critical twenty milliseconds long) with a relatively small packet size. This small package can be a silence descriptor (SID) indicating silence. A critical background noise segment can be the background noise segment that immediately follows a conversation spurt, or a background noise segment whose characteristics are significantly different from its previous noise segments. Other types of background noise segments (or non-critical background noise segments) can be indicated with zero bits, or deleted, or not transmitted or suppressed from transmission. When such an output packet pattern (ie active segment (s), then critical background noise segment (s), then the non-critical background noise segment (s)) purely depends on the input of the speech encoder, or the source, such a DTX scheme is called a source-controlled DTX scheme.
[00147] It may be desirable to carry out real-time voice communication between a terminal A (for example, a transmitting user equipment or UE, such as a 102 terminal) and a B terminal (for example, a receiving UE , such as a terminal 104) over one or more packet-switched networks. Previous solutions, such as AMR and AMR-WB, adapt to bad channel conditions, reducing the bit rate (also called "adaptation rate"). For high-end codecs for use in VoIP (Voice over Internet Protocol), reducing the bit rate cannot help significantly reduce congestion on networks (for example, due to overhead RTP, where RTP is the real-time transport protocol, as described in, for example, RFC 3550, Standard 64 (July 2003), Internet Engineering Task Force (IETF)). A method as described herein can impart greater robustness to the speech encoder and / or resolve codec performance problems due to channel failure.
[00148] The quality of the communication channel from the transmission terminal A to the receiving terminal B can be estimated by network entities (for example, by a base transceiver station at the end of the uplink radio channel network, for example a traffic analyzer on the core network, etc., and / or through receiving terminals B (for example, by analyzing the packet loss rate) .It may be desirable to transmit that information back to the transmitting UE using messages in-band, through control signals (for example, control packets using RTP Control Protocol (RTCP), as described in, for example, RFC 1889 (January 1996, IETF)), and / or through another quality of service (QoS) feedback. Transmission terminal A can be implemented to apply such information by switching to an operating mode (ie, a "channel aware" mode) that is optimized for good performance under impaired channel. In addition, the broadcast UE it can be configured to select a channel-aware mode of operation in establishing the call time, if the conditions of the bad channel can be anticipated (for example, unmanaged networks).
[00149] A vocoder can be implemented to switch to a "robust channel impairment mode", in response to an indication of poor channel conditions (eg packet errors, high jitter, etc.). In "robust channel failure mode", the speech codec can choose to retransmit certain critical frames of the input signal, either partially or totally. For example, a speech encoder operating in "robust channel impairment mode" can be configured to transmit a redundant copy of a frame if the frame's degree of importance exceeds a certain predetermined threshold. The criticality of a specific frame can be determined as a function of the perceived impact of the loss of that frame in the decoded speech as estimated in the encoder. A channel-aware codec can be configured to switch between a robust mode, channel impairment and a normal operating mode (ie, where no redundant copy is sent) in response to a channel status indication.
[00150] The systems, methods and devices, as disclosed here, can be implemented to define the criticality limit according to the channel quality estimate. For very good channels, the criticality limit can be very high. As the channel quality degrades, the criticality limit can be reduced so that more frames are considered critical.
[00151] Figure 13B shows a flow chart for an M120 implementation of the M110 method that includes a T50 task. Task T50 calculates a criticality threshold. Task T50 can be implemented to calculate the criticality limit based on information related to the status of a transmission channel. This information can include one or more of the following measures, which can be updated for each of a series of time intervals: packet loss rate, packet loss fraction, expected number of packets, loss rate per second, number number of packets received, estimated loss of validity (for example, a weight measurement based on a sample size measurement, such as the number of packets expected for the interval), apparent flow capacity, and jitter.
[00152] Task T50 can also be configured to calculate more than one limit, based on information regarding the status of the transmission channel. In such a case, the decision task T100 can be configured to use information from the frame (and / or one or more adjacent frames) to select the appropriate calculated threshold. For example, it may be desirable to use a criticality limit to determine a frame's criticality, which is determined to contain speech, and another criticality limit to determine a frame's criticality, which is determined to contain noise. In another example, different limits are used for transition (for example, initial) and stationary voice frames, and / or for express speech and non-express speech frames. For a case where more than one criticality limit is used, task T100 can be configured to select, among two or more criticality measures, a criticality measure that corresponds to the limit to be used for frame n.
[00153] The information that task T50 uses to calculate the limit can include one or more of the following measures, which can be updated for each of a series of time intervals: packet loss rate, packet loss fraction, expected number of packets, loss of rate per second, received packet count, estimated expiration loss (for example, a measure of weight based on a measure of the sample size, such as the number of packets expected for the interval) , apparent flow capacity, and jitter. As mentioned above, a receiver can be configured to transmit such information back to the UE using transmission messages in the band, using control signals (RTCP message is an example of such a signaling control method), and / or through another quality of service (QoS) return mechanism. Examples of information that can be provided via RTCP messages (Real-time Transport Control Protocol, as defined, for example, in the IETF RFC 3550 specification) include transmitted octet count, transmitted packet count, expected packet count , number and / or fraction of packets lost, jitter (for example, variation in delay), and round trip delay. Figure 13C shows a flow chart for an M130 implementation of the M120 method that includes a task T25 that receives channel status information (for example, as described above).
[00154] Figures 14A and 14B show examples of relationships between channel status information, the criticality threshold value that is based on that information, and the resulting probability that a frame will be indicated as critical. In the example in figure 14B, the reported quality of the channel is lower than the reported quality of the channel in figure 14A. Consequently, the criticality limit value in figure 14B is less selective than the criticality limit value in figure 14A, and the probability of resulting that a frame will be indicated as critical is greater. If the reported quality of the channel becomes too low, the resulting probability that a frame will be indicated as critical can become very high.
[00155] It may be desirable to limit the number or proportion of frames that can be indicated as critical. For example, it may be desirable to improve balance performance under poor channel conditions versus preserving the native speech's coding quality and / or stopping capacity loss due to retransmissions that can be triggered due to an overly inclusive criticality determination.
[00156] An approach to limit the retransmission frequency is to implement the M120 method in such a way that the limit value is subject to a low coverage value (ie, a low limit value, or a floor value) that defines a limit for the number of frames that can be retransmitted. For example, the M120 method can be implemented to impose a minimum value on the calculated limit value. Figure 15A shows a flow chart of such an M140 implementation of the M120 method that includes a T75 task. Task T75 compares a calculated candidate threshold value produced by task T50 with a threshold value (for example, a low coverage value). Based on the result of the comparison, task T75 selects one among (A), the calculated calculated limit value and (B) the limit value, in such a way that task T75 produces the selected value as the calculated limit. For example, task T75 can be implemented to select the calculated candidate value if it is greater than (alternatively, at least) the limit value, and to select the limit value in another way. In such a way, task T75 can be configured to hold the calculated limit value to the limit value. Task T75 can also be configured in such a way that when the comparison fails (for example, when clipping occurs), task T75 indicates such a condition from another module (for example, to record the condition, report the condition to the base station , and / or take another corrective action).
[00157] Of course, it is also possible to implement task T100, alternatively, in such a way that the calculated value of the criticality measure is inversely proportional to criticality. In such a case, task T100 can be configured to indicate that the frame is critical, when the criticality measure is low (alternatively, does not exceed) the calculated threshold value, and task T75 can be configured to compare (and possibly for stapling) the limit value calculated with a high coverage value (ie a high limit value or a maximum limit value). Figure 15B shows a flow chart for an implementation of M150 M130 and M140 methods, which includes tasks T25 and T75. It is expressly stated that task T100, possibly with one or more of tasks T25, T50, and T75, as described here (for example, any of T50 + T100, T50 + T75 + T100, T25 + T50 + T100, and T25 + T50 + T75 + T100), can be included in any of the other implementations of the M100 method described here (for example, as the implementation tasks before the T200 task).
[00158] Figure 16A shows a flow chart for an M600 implementation of the M100 method that includes task T400. Task T400 produces a redundant copy of the critical framework, according to the relocation candidate selected in task T300. The redundant copy typically has fewer bits than the primary copy of the critical frame in the encoded signal (that is, the copy of the critical frame as normally encoded) and can be used by a decoder to perform an early error correction (FEC) operation to correct errors resulting from partial or complete loss of the primary copy. Task T400 can be implemented to produce the redundant copy before the T300 selection task (for example, as an input parameter for calculating decision metrics in a T200 task implementation of the T250 task) or in response to the candidate selection of relocation by task T300.
[00159] As mentioned above, the selected relocation candidate can indicate the reallocation of the redundant copy as a number of bits or as a bit rate. Figure 16B shows a flow chart for an M610 implementation of the M600 method that includes a T410 implementation of the T400 task. Task T410 produces a redundant copy of the critical frame that has an AR bit length (for example, Nm bits), as indicated by the relocation of the selected candidate. Figure 16C shows a flow chart for an M620 implementation of the M600 method that includes a T420 implementation of the T400 task. Task T420 produces a redundant copy of the critical frame that is encoded at an rR rate, as indicated by the reallocation of the selected candidate.
[00160] In general, it is desirable that the redundant copy provides a good reference (for example, a good adaptive codebook) that can be used for decoding subsequent frames. The redundant copy of the critical frame can include some of all parameters of the primary copy of the critical frame. Task T400 can be implemented to produce the redundant copy as a reduced version of the primary copy. For example, the primary copy may be an encoded version of the critical frame that includes components, such as frequency envelope information (for example, LPC or MDCT coefficients) and / or time envelope information (for example, the codebook index fixed, fixed code book gain, adaptive code book gain, pitch lag, and / or field gain by a CELP codec; prototype parameters and / or pitch information for a PWI or PPP codec). Task T400 can be implemented to produce the redundant copy to include a copy of part or all of each of one or more of these components. For example, task T400 can be implemented to produce the redundant copy to include one or more codebook indexes that identify the quantified LPC filter parameters and / or quantized time envelope parameters (eg, excitation signal).
[00161] In such cases, task T400 can be implemented to mount the redundant copy using (for example, duplication and / or condensation) components of a primary copy of the critical frame that have already been calculated. Task T400 can be implemented to produce a redundant copy in such a way as to satisfy a bit constraint (for example, as a T410 task) or according to a frame associated with a rate constraint (for example, as a T420 task). Such a structure may include a specific number of bits, for the frame or for each of the one or more subframes of the frame, for each of a plurality of parameters, such as those mentioned above (i.e., LPC filter information, delay pitch, index / gain of adaptive codebook, etc.).
[00162] In addition, or alternatively, task T400 can be implemented to produce all or part of the redundant copy that encodes the critical frame, using an encoding method that is different from that used to produce the primary copy of the critical frame . In such a case, the different encoding method will typically have a lower rate than the method used to produce the primary copy of the critical frame (for example, using a lower-order LPC analysis, using a narrowband codec instead of broadband codec, etc.). Such a different encoding method may be a different bit rate and / or a different encoding scheme (for example, CELP for the primary copy and PPP or PWI for the redundant copy). Figure 17A shows a flow chart for an M630 implementation of the M600 method that includes a T430 implementation of the T400 task. Task T430 causes an encoder to produce a redundant copy of the critical frame. In one example, task T430 is implemented to provide the critical frame and the indicated allocation Nm (for example, as a number of bits, or as a bit rate) for the encoder.
[00163] Figure 17B shows a flow chart for an M640 implementation of the M600 method that includes a T440 implementation of the T400 task. Task T440 produces a copy of the frame (n + k) and a redundant copy of critical frame n. Task T400 can include reallocating an initial bit allocation T to the subsequent frame within a first portion and a second portion, according to the selected relocation candidate, and producing the copy of the frame (n + k), and the copy redundant to fit the respective portions (for example, in (T-Nm) and Nm bits, respectively).
[00164] Figure 17C shows a flow chart for an M650 implementation of the M600 method that includes a T450 implementation of the T400 task. Task T450 encodes the copy of the frame (n + k) in the first portion and encodes the redundant copy of critical frame n for the second portion.
[00165] In one example, the initial bit allocation value T is 253, which corresponds, for example, to a bit rate of 12.65 kbps (kilobits per second) and a frame length of twenty milliseconds. In another example, the value of T is 192, which corresponds, for example, to a bit rate of 9.6 kbps and a frame length of twenty milliseconds.
[00166] The selection of one among a set of distributions of an allocation of T bits can be implemented as a change in the bit rate of the selected subsequent frame and selection of a low bit rate scheme to encode the redundant copy of the critical frame . For example, the allocation of the allocation of T bits as a portion of size Nm bits to carry a redundant copy of the critical frame and a portion of size (T - Nm) bits to carry a copy of the subsequent frame, where T = 253 and Nm = 61, can be applied (for example, within an AMR codec), changing the bit rate of the subsequent frame from a bit rate from 12.65 kbps to a reduced bit rate of 9.6 kbps, which encodes for the subsequent frame according to an existing 9.6 kbps scheme, and using a 3.05 kbps scheme to encode a redundant copy of the critical frame.
[00167] It may be desirable to implement several of these low bit rate schemes for redundant encoding, each corresponding to a different one among the set of distributions. Examples of other starting bit rates include 8.85, 8.55, 6.6, 6.2, 4, 2.7 and 2 kbps, which correspond (for example, for a frame length of twenty milliseconds) to T values of 177, 171, 132, 124, 80, 54, and 40, respectively. Other examples of other start bit rates include 23.85, 23.05, 19.85, 18.25, 15.85, 14.25 and 12.65 kbps, which correspond (for example, to a frame length of twenty milliseconds) for T values of 477, 461, 397 , 365, 317, 285, and 253, respectively. A frame can be encoded according to a rate as described, for example, in version 10 of the AMR-WB codec mentioned here (for example, using a CELP encoding model).
[00168] The principles described here can be applied to single fixed bit rate schemes in which each frame receives the same initial bit allocation T. These principles can be applied to variable bit rate schemes (for example, multimode schemes or multiple fixed bit rate) in which the total frame allocation of T bits can change from one frame to another. For example, the number of T bits available to encode frame (n + k) may vary according to whether the frame contains speech or noise, or according to whether the frame contains express speech or unexpressed speech, etc.
[00169] The M300 and M500 methods can be implemented to include coding of at least one of the plurality of subsequent frames (for example, a non-carrier frame), using the T bits. Such methods may also include coding of each non-carrier a plurality of subsequent frames, using T bits. However, it is also possible for the audio signal to include two adjacent critical frames, or two critical frames that are otherwise close to each other, such that the set of subsequent K frames relating to an overlap of critical frames (for example, example, has at least one frame in common with) the set of K subsequent frames relative to the other critical frame. In such a case, one of the subsequent common frames can be selected to carry a redundant copy of a critical frame, and another of the subsequent common frames can be selected to carry a redundant copy of the other critical frame, such that each of these two frames subsequent are encoded using fewer than T bits. It is also possible that a subsequent selected framework may itself be a critical framework. In some cases, for example, it can be expected that the set of K subsequent frames in relation to a critical frame can include at least one other critical frame about twenty percent of the time.
[00170] Task T400 can be implemented to produce the frame copy (n + k) before the selection task T300 (for example, as an input parameter for calculating decision metrics in task T250) or in response to the selection the relocation candidate for task T300. Figure 18A shows a flow chart for an M660 implementation of the M610 method, which includes tasks TA10 and TB10. Task TA10 receives an initial allocation indication of T bits as a number of A0 bits assigned to the frame (n + k). Task TB10 encodes the copy of the frame (n + k) into A1 bits (for example, (T-Nm) bits), where A1 is less than A0. Method M660 also includes a task instance T250 that is prepared to receive information from the copy of the frame (n + k) encoded in task TB10 as an input parameter. For example, task T250 can be implemented to use the copy of the table (n + k), to calculate one or more values of a quality change decision metric, as described here.
[00171] Figure 18B shows a flow chart for an M670 implementation of the M620 method, which includes tasks TA20 and TB20. Task TA20 receives an indication of initial allocation of T bits as a rate selection of r0 for the frame (n + k). Task TB20 encodes the copy of the frame (n + k), according to a rate r1 that is less than r0. Method M670 also includes a task instance T250 that is prepared to receive information from the copy of the frame (n + k) encoded in task TB20 as an input parameter. For example, task T250 can be implemented to use the copy of the table (n + k), to calculate one or more values of a quality change decision metric, as described here.
[00172] Figure 18C shows a flow chart for an M700 implementation of the M600 method that includes a T500 task. Task T500 produces a package that contains a copy of the subsequent frame (n + k) and the redundant copy of critical frame n as produced by task T400. Figure 19A shows a flow chart for an M710 implementation of the M610 and M700 methods. Figure 19B shows a flow chart for an M720 implementation of the M620 and M700 methods. It may be desirable for the packet to include information indicating that it carries a redundant copy of the critical frame, which indicates the value of the deviation k, and / or which indicates the number of relocated bits Nm. Alternatively, this information may be derivable by the decoder a from other information in the encoded signal.
[00173] A package can include one or more frames. It may be desirable to limit the packet length to 20 milliseconds (for example, to reduce the delay). Figure 20A shows an example of overhead for an encoded packet using a typical protocol stack for VoIP communications that includes Internet Protocol version 4 (IPv4), User Datagram Protocol (UDP) and RTP. Figure 20B shows a similar example for an IP version 6 (IPv6) packet. Examples of payload size include 160 bytes for a G.711 codec, 20 bytes for a G.729 codec, and 24 bytes for a G.723.1 codec. Other codecs that can be used with a method for bit reallocation for redundant encoding, as described herein include, without limitation, G.726, G.728, G.729A, AMR, AMR-WB, AMR-WB + ( for example, as described in 3 GPP TS 26.290 v10.0.0, March 2011), VMR-WB (3GPP2 C.S0052-0, Service Options 62 and 63), the Advanced Variable Rate Codec (EVRC, as described in Third Generation Partnership Project 2 (3GPP2) C.S0014-C, v1.0, entitled "Enhanced Variable Rate Codec, Speech Service Options 3, 68 and 70 for Wideband Spread Spectrum Digital Systems", February 2007 (available online line at www-dot-3GPP-dot-org)), the vocoder speech codec selectively (as described in document 3GPP2 C.S0030-0 v3.0, entitled "Selectable Mode Vocoder (SMV) Service Option for Wideband Spread Spectrum Communication Systems ", January 2004 (available online at www-dot-3GPP-dot-org)), and the improved speech service codec (EVS, for example, as described in 3GP P TR 22.813 v10.0.0 (March 2010), made available by ETSI).
[00174] Figure 21 shows an example of a payload for an RTP packet that carries a redundant copy of a critical frame and a copy of a frame that is subsequent to the critical frame. The redundant copy (r (0) air bits (176)) is encoded in the AMR-WB 8.85 kbps mode, as indicated by the value of one for the corresponding frame type indicator FT, and the copy of the subsequent frame ( bits p (0) to p (131)) is encoded in the AMR-WB 6.6 kbps mode, as indicated by the value of zero for the corresponding frame type indicator FT. In this example, the CMR codec mode request indicator requests the encoder at the receiving terminal to adopt the 8.85 kbps mode, and the payload ends with three fill bits P to fill the last octet. In other examples, the payload may contain more than two encoded frames, and / or the bits of the redundant copy may precede the bits of the copy of the subsequent frame in the packet (with the order of the corresponding table-of-contents entries for the copies being switched accordingly).
[00175] It may be desirable to use header compression: for example, to compress the RTP header from twelve bytes to four bytes. The RTP header includes a time stamp, which can be used to calculate transmission time, and a sequence number, which can be used to correctly display packets received out of order and / or to detect packet loss. Robust header compression (ROHC; as described in IETF RFC 3095, RFC 3843, and / or RFC 4815) can be used to support higher compression rates (for example, the compression of one or more, and possibly all packet headers to 1 to 4 bytes).
[00176] Figure 22 is a block diagram of an AD20 implementation of the AD10 audio decoder. The AD20 audio decoder can be implemented as part of a vocoder, as an independent entity, or distributed to one or more entities within the receiving terminal 104. The AD20 audio decoder can also be implemented as part of a VoIP client.
[00177] The AD20 audio decoder will be described below in terms of its functionality. The AD20 audio decoder can be implemented as hardware, firmware, software, or any combination thereof, and the way in which it is applied may depend on the particular implementation and design restrictions imposed on the overall system. For example, the AD20 audio decoder can be implemented with a microprocessor, digital signal processor (DSP), programmable logic, dedicated hardware or any other hardware and / or software-based processing entity.
[00178] In this example, the AD20 audio decoder includes a DB10 de-jitter compensation buffer (also called a "jitter store"). DB10 de-jitter store can be a hardware device or software process that reduces or eliminates jitter caused by variations in the time of packet arrival (due, for example, to network congestion, timing bypass, and / or changes to route). DB10 deitter jitter can receive audio frames in packets. DB10 de-jitter store can be implemented to delay newly arrived packets, so that frames from previously arrived packets can be provided continuously for the FD20 frame decoder, in the correct order (for example, as indicated by the time stamps of the packets), resulting in a clear connection with little audio distortion. DB10 de-jitter storage can be fixed or adaptive. A fixed de-jitter store can introduce a fixed delay on packets. An adaptive dejitter store, on the other hand, can adapt to changes in network delay. DB10 de-jitter storer can provide encoded audio frames (for example, including XL, XF, XG and XP indexes) for the FD20 frame decoder in the appropriate order.
[00179] If a copy of a frame is not received by the de-jitter store, a loss of frame can be caused if the FEC is not used. When FEC is used and the current copy of the frame to be played is lost, the DB10 de-jitter store can determine if there is a redundant copy of the frame in the store. If a redundant copy for the current frame is available, a redundant copy can be provided for the FD20 frame decoder to decode to generate audio samples.
[00180] In addition, the DB10 de-jitter store can be modified to process a primary frame (that is, the original critical frame) and a redundant frame (that is, a copy of part or all of the original critical frame) in a different way. DB10 storage can process these two frames differently, so that the average delay associated with implementing an FEC operation as described here is no greater than the average delay when the FEC operation is not implemented. For example, DB10 storage can be implemented to detect that an input packet contains a redundant copy (for example, that the packet contains two frames) and to start decoding the redundant copy in response to that detection.
[00181] Audio frames released from the DB10 de-jitter store can be provided for the FD20 frame decoder to generate the DF decoded core audio frames (for example, synthesized voice). In general, the FD20 frame decoder can be implemented to perform any synthesized speech speech decoding method known in the art. In the example in figure 22, the FD20 frame decoder uses a CELP decoding method that corresponds to the encoding method described above with reference to figure 3. In this example, the fixed code vector generator VG10 decodes FCB XF index and a corresponding portion of the XG gain index to produce fixed code vectors for each subframe, the inverse quantizer IA10 and A50 vector generator decode ACB XP index and a corresponding portion of the XG gain index to produce adaptive code vectors for each subframe, and AD10 adder combines the corresponding code vectors to produce the excitation signal and to update ME10 memory (for example, as described in steps 1-8 of section 6.1 of 3GPP TS 26.190 v10.0.0). The inverse quantizer IL10 and the inverse transform module IM10 decode the LPC XL index to produce LP filter coefficient vectors, which are applied to excitation for SF10 filter synthesis to produce a synthesized signal (for example, as described in the opening paragraph and step 4 of section 6.1 of initial 3GPP TS 26.190 v10.0.0). The raw synthesized signal is provided for PF10 post-filter, which can be implemented to perform operations, such as high pass filtering, enhancement and interpolation (for example, as described in section 6.2 of 3GPP TS 26.190 v10.0.0) to produce the DF decoded core audio frames. Alternatively, and without limitation, the FD20 frame decoder can use either NELP or PPP full frame decoding methods.
[00182] Redundant copies of frames that include some (ie, a partial set) of the parameter values of the primary copy can be passed from the DB10 de-jitter store to a partial frame decoding module. For example, an FD20 frame decoder can be implemented to generate a frame that corresponds to the critical frame (for example, according to an error concealment procedure, as described in 3GPP TS 26,091, v.10.0.0 as mentioned above ) before the redundant copy is available. In this case, the FD20 frame decoder may include a partial frame decoding module that is configured to update the ME10 memory (for example, according to the fixed and adaptive code book indexes and redundant copy gains) before decoding the carrier frame (n + k).
[00183] In one configuration, the copy of the subsequent frame (n + k) and the redundant copy of the critical frame n are packaged in RTP packets and transmitted to receiving terminal 104. In another configuration, the copy of the subsequent frame and the copy redundant critical frame, although they can be generated at the same time, they are packaged in different corresponding RTP packets and transmitted to the receiving terminal. The decision of which format to use can be based on the capabilities of both terminals. If both formats are supported on each terminal, for example, the format that supports a lower data rate can be used.
[00184] On the receiver side, the speech frames can be stored in the DB10 de-jitter store, which can be adaptive. As mentioned earlier, the DB10 de-jitter store can be designed so that the average delay of speech frames is not greater than the average delay without FEC techniques. Frames can be sent to a frame decoder (eg, FD20 decoder) in the proper order from the DB10 de-jitter store. If the redundant copy is a partial set of the parameters of the primary copy, a partial frame decoding module can be used.
[00185] A source-controlled (and possibly channel-controlled) FEC scheme, as described here, can reduce the number of packet losses and the burst of losses with little or no increase in the data rate. Critical status identification can help ensure a good trade-off between perceptual speech quality and data rate. Such an FEC scheme can be implemented to use the available bandwidth efficiently and to be compatible with legacy communication devices.
[00186] The audio encoder AE10 can be implemented to include a dynamic rate control module. This module can implement two steps to approach a predetermined target rate. In the first stage, two adjacent operating points are determined. These two adjacent operating points, which can be data rates, are chosen so that the target data rate value is between the values of the two operating points. The target data rate can be specified externally, based on capacity demands. Alternatively, the target data rate can be specified internally based, for example, on channel status information. Such rate control can be implemented to allow an FEC scheme as described here, to be carried out at any specified data rate, so that operators can decide the data rate based on capacity demand.
[00187] Figure 23A shows a block diagram of an MF100 device according to a general configuration. Apparatus MF100 includes means F200 for calculating at least one value of a decision metric, based on information from an audio signal frame that is subsequent in the audio signal to a critical audio signal frame (the "frame" subsequent "or" carrier frame ") (for example, as described herein with reference to task T200). Apparatus MF100 also includes means F300 for selecting one among a plurality of relocation candidates, wherein the selected relocation candidate indicates a reallocation of an initial bit allocation T to the subsequent frame within a first portion and a second portion (for example , as described here with reference to task T300).
[00188] Figure 23B shows a block diagram of an MF300 implementation of the MF100 device. Apparatus MF300 includes an F220 implementation of means F200 which is for calculating at least one value of a decision metric for each of a plurality of frames (for example, as described herein with reference to task T220). Apparatus MF300 also includes an F350 implementation of F300 means which is to select one among a plurality of relocation candidates and one among a plurality of frames (for example, selecting the corresponding value of the deviation k, as described herein with reference to task T350 ).
[00189] Figure 23C shows a block diagram of an MF500 implementation of the MF100 device. Apparatus MF500 includes an F270 implementation of means F200 which is for calculating a plurality of sets of values of a decision metric (for example, as described herein with reference to task T270). Apparatus MF500 also includes an example of F350 media.
[00190] Figure 24A shows a block diagram of an MF140 implementation of the MF100 device. Apparatus MF140 includes F50 means for calculating a criticality limit (for example, as described here with reference to task T50), F75 means for comparing the calculated criticality limit with a limit value (for example, as described with reference here to task T75), and means F100 to determine that frame n is critical (for example, as described herein with reference to task T100).
[00191] Figure 24B shows a block diagram of an MF150 implementation of the MF140 device. Apparatus MF140 includes means F25 for receiving channel status information (for example, as described herein with reference to task T25). As described herein, channel status information, which can indicate the quality of the channel used for transmissions between the transmission terminal 102 and the receiving terminal 104, can be collected and evaluated at the receiving terminal 104 and transmitted back to the transmission terminal 102.
[00192] Figure 25A shows a block diagram of an A100 device according to a general configuration that includes a calculator 200 and a selector 300. Calculator 200 is configured to calculate at least one value of a decision metric, based on information from an audio signal frame that is subsequent in the audio signal to a first audio signal frame (for example, as described herein with reference to task T200). Selector 300 is configured to select one of a plurality of relocation candidates, based on at least one calculated value of the decision metric (for example, as described here with reference to task 300), where the selected relocation candidate indicates a reallocation of a t-bit allocation to the subsequent frame within a first portion and a second portion. Apparatus A100 can also be implemented to include a frame encoder configured to produce a redundant copy of the first frame (e.g., FE20 frame encoder), a package assembler configured to produce a package containing a copy of the subsequent frame and the redundant copy (for example, as described herein with reference to task T500), and / or a critical frame indicator configured to determine that the first frame is a critical frame (for example, as described herein with reference to task T100).
[00193] Figure 25B shows a block diagram of an A300 implementation of the A100 device. Apparatus A300 includes an implementation 220 of calculator 200 which is configured to calculate at least one value of a decision metric for each of a plurality of frames (for example, as described herein with reference to task T220). Apparatus A300 also includes an implementation 350 of selector 300 which is configured to select one among a plurality of relocation candidates and one among a plurality of frames (for example, selecting the corresponding value of the deviation k, as described herein with reference to the task T350).
[00194] Figure 25C shows a block diagram of an A500 implementation of the A100 device. Apparatus A500 includes an implementation 270 of calculator 200 which is configured to calculate a plurality of sets of values from a decision metric (for example, as described herein with reference to task T270). Device A500 also includes a selector instance 350.
[00195] Figure 20C shows a block diagram of a D10 communications device that includes a CS10 chip or chip set (for example, a mobile station modem (HSH) modem chip set) that incorporates the handset elements A100 (or MF100). CS10 chip / chip set can include one or more processors, which can be configured to run software and / or part of the A100 or MF100 device firmware (for example, as instructed). The transmission terminal 102 can be realized as an implementation of the device D10.
[00196] Chip / chip set CS10 includes a receiver (for example, RX10), which is configured to receive a radio frequency (RF) communications signal and to decode and reproduce an audio signal encoded within the signal. RF, and a transmitter (for example, TX10), which is configured to transmit an RF communication signal that describes an encoded audio signal (for example, as produced by task T500). Such a device can be configured to transmit and receive voice communication data wirelessly, using any one or more of the codecs referenced here.
[00197] The D10 device is configured to receive and transmit the RF communications signals through a C30 antenna. The C30 device may also include a diplexer and one or more power amplifiers in the path to the C30 antenna. CS10 chip / chip set is also configured to receive user input via the C10 keyboard and to display information through the C20 display. In this example, device D10 also includes one or more C40 antennas to support Global Positioning System (GPS) location services and / or short-range communications with an external device, such as a wireless headset (for example, Bluetooth ™ ). In another example, such a communication device is itself a Bluetooth ™ headset and has no C10 keyboard, C20 display, and C30 antenna.
[00198] The D10 communications device can be incorporated into a variety of communication devices, including smartphones and laptop and tablet computers. Figure 26 shows front, rear and side views of such an example: the H100 device (for example, a smartphone) with MV10-1 and MV10-3 voice microphones arranged on the front side, an MV10-2 voice microphone arranged on the face rear, another ME10 microphone (for example, for greater directional selectivity and / or capturing an acoustic error in the user's ear for the entry of an active noise cancellation operation), located in an upper corner of the front face, and another microphone MR10 (for example, for increasing directional selectivity and / or capturing a background noise reference) located on the rear face. The LS10 speaker is arranged in the center of the top of the front face near the ME10 error microphone, and two other LS20L, LS20R speakers are also provided (for example, for speakerphone applications). The maximum distance between microphones in such a device is typically about ten or twelve centimeters.
[00199] Figure 25D shows a block diagram of a wireless device 1102 that can be implemented to perform the method as described here (for example, any one or more of the methods M100, M200, M300, M400, M500, M600 , M700 e). Transmission terminal 102 can be realized as an implementation of wireless device 1102. Wireless device 1102 can be a remote station, access terminal, telephone, personal digital assistant (PDA), cell phone, etc.
[00200] The wireless device 1102 includes a processor 1104 that controls the operation of the device. Processor 1104 can also be referred to as a central processing unit (CPU). Memory 1106, which can also include both read-only memory (ROM) and random access memory (RAM), provides instructions and data for processor 1104. A portion of memory 1106 may also include non-volatile random access memory (NVRAM) . Processor 1104 typically performs logical and arithmetic operations based on program instructions stored in memory 1106. Instructions in memory 1106 can be executable to implement the method or methods as described herein.
[00201] The wireless device 1102 includes a housing 1108 which can include a transmitter 1110 and a receiver 1112 to allow the transmission and reception of data between the wireless device 1102 and a remote location. Transmitter 1110 and receiver 1112 can be combined into a transceiver 1114. An antenna 1116 can be connected to frame 1108 and electrically coupled to transceiver 1114. Wireless device 1102 may also include (not shown), multiple transmitters, multiple receivers, multiple transceivers and / or multiple antennas.
[00202] In this example, wireless device 1102 also includes a signal detector 1118 that can be used to detect and quantify the level of signals received by receiver 1114. Signal detector 1118 can detect such signals as total power, power chips pilot by pseudo-noise (PN), power spectral density, and other signals. The wireless device 1102 also includes a digital signal processor (DSP) 1120 for use in signal processing.
[00203] The various components of the wireless device 1102 are coupled together by a 1122 bus system, which may include a power bus, a control signal bus, and a status signal bus, in addition to a data bus. For the sake of clarity, the different buses are illustrated in figure 18B as the 1122 bus system.
[00204] The methods and devices described here can be applied, in general, in any application of transmission and reception and / or audio perception, especially if mobile or otherwise portable for such applications. For example, the variety of configurations described here, includes communication devices that reside in a wireless telephony communication system configured to employ a code division multiple access (CDMA) air interface. However, it will be understood by those skilled in the art that a method and apparatus having the characteristics, as described herein, may be in any of the various communication systems that use a wide variety of technologies known to those skilled in the art, such as systems that employ Voice over IP over wired and / or wireless transmission channels (VoIP) (for example, CDMA, TDMA, FDMA, and / or TD-SCDMA).
[00205] It is expressly contemplated and hereby disclosed that the communication devices described herein can be adapted for use in networks that are packet-switched (for example, wired and / or wireless networks arranged to carry audio transmissions in accordance with protocols such as VoIP) and / or circuit switched. It is also expressly contemplated and hereby disclosed that the communication devices described herein may be adapted for use in narrowband encoding systems (for example, systems encoding an audio frequency range of about four or five kilohertz) and / or for use in broadband encoding systems (for example, systems that encode audio frequencies greater than five kilohertz), including full-bandwidth encoding systems and split-bandwidth encoding systems.
[00206] The presentation of the described configurations is provided to allow anyone skilled in the art of making or using the methods and other structures disclosed herein. Flowcharts, block diagrams, and other structures shown and described here are examples only, and other variants of these structures are also within the scope of the disclosure. Various modifications to these configurations are possible, and the general principles presented here can be applied to other configurations. Thus, the present description is not intended to be limited to the configurations shown above, but instead should be granted the broadest scope consistent with the principles and new features described in any form here, including in the appended claims, as presented, that form a part of the original disclosure.
[00207] Those skilled in the art will understand that information and signals can be represented using any of a variety of different technologies and techniques. For example, data, instructions, commands, information, signals, bits, and symbols that can be referenced throughout the above description can be represented by voltages, currents, electromagnetic waves, magnetic fields or particles, optical fields or particles, or any combination of the same.
[00208] Important design requirements for implementing a configuration as disclosed herein may include minimizing processing delay and / or computational complexity (typically measured in millions of instructions per second or MIPS), especially for intensive computing applications, such as reproduction compressed audio or audiovisual information (for example, a file or stream encoded according to a compression format, as one of the examples identified here) or applications for broadband communications (for example, voice communications at sample rates greater than eight kilohertz, such as 12, 16, 32, 44.1, 48 or 192 kHz).
[00209] A device as disclosed herein (for example, any of the devices MF100, MF110, MF120, MF200, MF210, MF300, AP100, AP110, AP120, AP200, AP210, AP300 and AP400) can be implemented in any combination of hardware with software, and / or firmware, which is considered suitable for the intended application. For example, the elements of such an apparatus may be manufactured as resident electronic and / or optical devices, for example, on the same chip or between two or more chips in the chip set. An example of such a device is a programmable array or wire of logic elements, such as transistors or logic gates, and any of these elements can be implemented as one or more such arrays. Any two or more, or even all, these elements can be implemented in the same matrix or matrices. These arrays or arrays can be implemented within one or more chips (for example, within the chip set including two or more chips).
[00210] One or more elements of the various implementations of the apparatus disclosed here (for example, any of the devices MF100, MF110, MF120, MF200, MF210, MF300, AP100, AP110, AP120, AP200, AP210, AP300, AP400 and) can be implemented in whole or in part, as one or more sets of instructions arranged to execute on one or more fixed or programmable arrays of logic elements, such as microprocessors, embedded processors, IP cores, digital signal processors, FPGAs field programmable ports), ASSPs (application specific standardized products) and ASICs (application specific integrated circuits). Any of the various elements of an appliance implementation, as disclosed herein, may also be incorporated as one or more computers (for example, machines, including one or more arrays programmed to execute one or more sets or sequences of instructions, also called "processors"), and two or more, or even all, these elements can be implemented within the same computer or computers.
[00211] A processor or other means for processing as described herein can be manufactured as one or more electronic and / or optical devices that reside, for example, on the same chip or between two or more chips in a chip set. An example of such a device is a fixed or programmable array of logic elements, such as transistors or logic gates, and any of these elements can be implemented as one or more of such arrays. Such an array or arrays can be implemented within one or more chips (for example, within a chip set, including two or more chips). Examples of such arrays include fixed or programmable arrays of logic elements, such as microprocessors, embedded processors, IP cores, DSPs, FPGAs, ASSPs and ASICs. A processor or other means for processing as described herein can also be incorporated as one or more computers (for example, machines, including one or more arrays programmed to execute one or more sets or sequences of instructions) or other processors. It is possible for a processor, as described in the present invention, to be used to perform tasks or execute other sets of instructions that are not directly related to a process of an M100 method implementation, such as a task related to another device operation. or system in which the processor is embedded (for example, an audio detection device). It is also possible for a part of a method as described herein to be performed by a processor of the audio perception device and for another part of the method to be performed under the control of one or more other processors.
[00212] Those skilled in the art will appreciate that the various illustrative modules, logic blocks, circuits, and tests and other operations described in connection with the configurations disclosed herein can be implemented as electronic hardware, computer software, or combinations of both. These modules, logic blocks, circuits and operations can be implemented or performed with a general purpose processor, digital signal processor (DSP), ASIC or ASSP, FPGA or other programmable logic device, discrete port or transistor logic , discrete hardware components, or any combination thereof designed to produce the configuration as disclosed herein. For example, such a configuration can be implemented, at least in part, as a wired circuit, as a circuit configuration manufactured within an application-specific integrated circuit, or as a firmware program loaded into non-volatile memory or a program loaded from or onto a data storage medium as machine-readable code, such code being instructions executable by a set of logical elements, such as a general purpose processor or other digital signal processing unit. A general purpose processor can be a microprocessor, but alternatively, the processor can be any processor, controller, microcontroller, or conventional state machine. A processor can also be implemented as a combination of computing devices, for example, a combination of a DSP and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a DSP core, or any other type of configuration. A software module can reside on a non-transitory storage medium, such as RAM (random access memory), ROM (readable only memory), non-volatile RAM (NVRAM), such as flash RAM, erasable programmable ROM (EPROM), Electrically erasable programmable ROM (EEPROM), registers, hard disk, removable disk, or CD-ROM; or in any other form of storage medium known in the art. An illustrative storage medium is coupled to the processor such that the processor can read information from, and write information to, the storage medium. Alternatively, the storage medium can be integral to the processor. The processor and storage medium can reside in an ASIC. The ASIC can reside on a user terminal. Alternatively, the processor and the storage medium can reside as separate components in a user terminal.
[00213] It should be noted that the various methods disclosed here (for example, any of the methods M100, MHO, M120, M200, M210, and M300) can be performed by a set of logical elements, such as a processor, and that the various elements of an apparatus as described herein can be implemented as modules designed to perform in such an arrangement. As used herein, the term "module" or "submodule" can refer to any computer-readable method, apparatus, device, unit or data storage medium that includes computer instructions (for example, logical expressions) in software form , hardware or firmware. It should be understood that several modules or systems can be combined into one module or system and one system module or can be separated into several modules or systems to perform the same functions. When implemented in software or other computer-executable instructions, the elements of a process are essentially segments of code to perform related tasks, such as routines, programs, objects, components, data structures, and so on. The term "software" is to be understood as including source code, assembly language code, machine code, binary code, firmware, macrocode, microcode, any one or more sets or sequences of instructions executable by a set of logical elements , and any combination of such examples. The program or code segments can be stored in a processor-readable medium or transmitted by a computer data signal embedded in a carrier wave through a transmission medium or communication link.
[00214] The implementations of methods, schemes and techniques described herein can also be incorporated in a tangible form (for example, in tangible computer-readable resources of one or more computer-readable storage media, as listed here) as one or more sets of instructions executable by a machine, including an array of logic elements (for example, a processor, microprocessor, microcontroller, or other finite state machine). The term "computer-readable medium" can include any medium capable of storing or transferring information, including volatile, non-volatile, removable and non-removable storage media. Examples of a computer-readable medium include an electronic circuit, a semiconductor memory device, a ROM, a flash memory, an erasable ROM (EROM), a floppy disk or other magnetic storage device, a CD-ROM / DVD or other optical storage, a hard disk or any other medium that can be used to store the desired information, a fiber optic medium, a radio frequency (RF) link, or any other medium that can be used to carry the desired information and can be accessed. The computer data signal can include any signal that can propagate through a transmission medium, such as electronic network channels, optical fibers, air, electromagnetic, RF links, etc. Code segments can be downloaded via computer networks, such as the Internet or an intranet. In any case, the scope of the present description should not be considered as limited by these modalities.
[00215] Each of the tasks of the methods described here can be performed directly in hardware, in a software module executed by a processor, or in a combination of the two. In a typical application of a method implementation as described here, a set of logic elements (for example, logic gates) is configured to execute one, more than one, or even all the various functions of the method. One or more (possibly all) of the tasks can also be implemented as code (for example, one or more sets of instructions), incorporated into a computer program product (for example, one or more data storage media, such as floppy disks, flash or other non-volatile memory cards, semiconductor memory chips, etc.), which can be read and / or executed by a machine (for example, a computer), including an array of logic elements (for example, processor, microprocessor, microcontroller, or other finite state machine). The tasks of implementing a method as described herein, can also be performed by more than one such matrix or machine. In these or other implementations, tasks can be performed within a wireless communication device, such as a cell phone or other device with this communication capability. Such a device can be configured to communicate with circuit-switched and / or packet-switched networks (for example, using one or more protocols, such as VoIP). For example, such a device can include RF circuits configured to receive and / or transmit encrypted frames.
[00216] It is expressly stated that the various methods described here can be performed by a portable communications device, such as a telephone device, headphones, or portable digital assistant (PDA), and that the various devices described here can be included within such a device. A typical real-time application (for example, online) is a telephone conversation conducted using a mobile device.
[00217] In one or more exemplary modalities, the operations described here can be implemented in hardware, software, firmware, or any combination thereof. If implemented in software, such operations can be stored or transmitted using a computer-readable medium, such as one or more instructions or code. The term "computer-readable media" includes both computer-readable storage media and media (for example, transmission). As an example, and not as a limitation, computer-readable storage media may comprise a set of storage elements, such as semiconductor memory (which may include, without limitation, dynamic or static RAM, ROM, EEPROM, and / or Flash RAM), or ferroelectric, magnetoresistive, ovonic, polymeric, or phase change memory; CD-ROM or other optical disk storage; and / or storage on magnetic disk or other magnetic storage devices. Such storage media can store information in the form of instructions or data structures that can be accessed by a computer. The means of communication can comprise any means that can be used to carry the desired program code, in the form of instructions or data structures, and that can be accessed by a computer, including any means that facilitates the transfer of a software program. computer from place to place. In addition, any connection is correctly considered a computer-readable medium. For example, if the software is transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technology, such as infrared, radio and / or microwave, then coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technology such as infrared, radio and / or microwave are included in the definition of medium. Disc and floppy disk, as used herein, include compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray ™ disc (Blu-Ray Disc Association, Universal City, CA), where floppy disks generally reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included in the scope of computer-readable media.
[00218] An acoustic signal processing device, as described here, can be incorporated into an electronic device, which accepts voice input, in order to control certain operations, that is, it can otherwise benefit from noise separation background noise, such as communications devices. Many applications can benefit from the improvement or separate the desired clear sound from background sounds from various directions. Such applications may include human-machine interfaces on electronic or computing devices, which incorporate features such as speech recognition and detection, speech improvement and separation, voice-activated control, and so on. It may be desirable to implement such an acoustic signal processing apparatus to be suitable in devices that only provide limited processing capabilities.
[00219] The elements of the various implementations of the modules, elements and devices described herein can be manufactured as electronic and / or optical devices that reside, for example, on the same chip or between two or more chips on a chip. An example of such a device is a fixed or programmable array of logic elements, such as transistors or gates. One or more elements of the different implementations of the apparatus described here can be implemented in whole or in part, as one or more sets of instructions arranged to be executed in one or more fixed or programmable arrays of logic elements, such as microprocessors, embedded processors, IP cores, digital signal processors, FPGAs, ASSPs and ASICs.
[00220] It is possible that one or more elements of an implementation of a device as described here are used to perform tasks or execute other sets of instructions that are not directly related to the operation of the device, such as a task related to another operation of a device or system in which the device is embedded. It is also possible that one or more elements of an application of such a device have a common frame (for example, a processor used to execute pieces of code corresponding to different elements at different times, a set of instructions executed to perform tasks corresponding to the different elements at different times, or an arrangement of electronic and / or optical devices performing operations for different elements at different times).

权利要求:
Claims (15)
[0001]
1. Method (M100, M200, M300, M400) of processing an audio signal, the method characterized by comprising: calculating (T200, T210, T220, T250) at least one value of a decision metric for a second frame of the audio signal that is subsequent in the audio signal to a first frame of the audio signal; and select (T300, T350), based on at least one calculated value of the decision metric, one among a plurality of relocation candidates; where the at least one calculated value is based on a compressibility measure of the second frame, and where the selected relocation candidate indicates a reallocation of an initial bit allocation to the second frame in a first bit portion that is used to encode the second frame and a second bit portion that is used to encode a redundant copy of the first frame.
[0002]
Method according to claim 1, characterized in that the method includes determining that the first frame is a critical frame of the audio signal.
[0003]
Method according to claim 2, characterized in that it determines that the first frame is a critical frame to be based on information from a coded version of a frame of the audio signal that is subsequent in the audio signal to the first frame.
[0004]
Method according to claim 3, characterized in that the coded version is a coded version of the second frame.
[0005]
Method according to any one of claims 2 to 4, characterized in that it comprises comprising comparing a criticality measure with a criticality limit; and preferably determining which comprises calculating the criticality limit based on information relating to a state of a transmission channel; and yet preferably where calculating the criticality limit includes: comparing a value calculated on the basis of information relating to the status of the transmission channel with a limit value; and selecting, in response to a result of the comparison with the limit value, the limit value as the criticality limit.
[0006]
Method according to any one of claims 1 to 5, characterized in that measuring compressibility indicates a correlation between subframes of the second frame.
[0007]
7. Method according to any one of claims 1 to 6, characterized in that selecting one of a plurality of relocation candidates comprises comparing a calculated value of the decision metric with each of an ordered plurality of decision limits, and in which each of the ordered plurality of decision limits corresponding to a different candidate among the plurality of relocation candidates.
[0008]
Method according to any one of claims 1 to 7, characterized in that the method comprises calculating a plurality of decision metric values, each corresponding to a different frame of the audio signal that is subsequent in the audio signal to the first frame , and where each of the plurality of values of the decision metric is based on a compressibility measure of the corresponding table, and where the method additionally comprises selecting, based on at least some of the plurality of values of the decision metric , the second frame among the different frames.
[0009]
9. Method according to any one of claims 1 to 5, characterized by at least one calculated value of the decision metric comprising a set of calculated values of the decision metric, and in which each of the set of calculated values corresponds to a candidate different from the plurality of relocation candidates.
[0010]
10. Method, according to claim 9, characterized in that each of a set of calculated values is based on a measure of quality of perception associated with the corresponding candidate among the plurality of relocation candidates; and preferably on which to measure compressibility is based on information from a coded version of the second frame; and preferably where the at least one calculated value is based on a relationship between measures of quality of perception of the second frame for different coding rates; and still preferably where the at least one calculated value is based on a relationship between (A) a second frame compressibility measure for the initial bit allocation and (B) a second frame compressibility measure for the corresponding plurality candidate relocation candidates.
[0011]
Method according to any one of claims 1 to 5, characterized in that at least one calculated value of the decision metric comprises a plurality of sets of calculated values of the decision metric, each one of the plurality of sets corresponding to a table different from the audio signal that is subsequent in the audio signal to the first frame, and where, within each set, each value corresponds to a different candidate among the plurality of relocation candidates.
[0012]
12. Method, according to claim 11, characterized in that, within each set, each value is based on a measure of quality of perception associated with the corresponding candidate among the plurality of relocation candidates; and preferably where, within each set, each value is based on information from a coded version of the corresponding table; and still preferably, in which the method comprises, based on the calculated values of at least some of the plurality of sets, selecting the second frame among the different frames.
[0013]
13. Method according to any one of claims 1 to 12, characterized by the method comprising, in response to the selection of one of the plurality of relocation candidates, producing a package that includes the redundant copy of the first frame and a copy of the second frame, where the copy of the second frame is encoded in the first portion.
[0014]
14. Apparatus (MF100, MF300, MF500) for processing an audio signal, the apparatus characterized by comprising: means (F200, F220, F270) to calculate at least one decision metric value for a second frame of the audio signal that is subsequent in the audio signal to a first frame of the audio signal; and means (F300, F350) for selecting one from a plurality of relocation candidates, based on at least one calculated value of the decision metric, where the at least one calculated value is based on a second frame compressibility measure, and where the selected relocation candidate indicates a reallocation of an initial bit allocation to the second frame in a first bit portion that is used to encode the second frame and a second bit portion that is used to encode a redundant copy of the first frame.
[0015]
15. Memory characterized by comprising instructions for making a machine perform the method as defined in any one of claims 1 to 13.

类似技术:

公开号 | 公开日 | 专利标题

BR112014017119B1|2020-12-22|system, method, apparatus and computer-readable media for bit allocation for redundant transmission of audio data

US10424306B2|2019-09-24|Frame erasure concealment for a multi-rate speech and audio codec

RU2673847C2|2018-11-30|Systems and methods of communicating redundant frame information

同族专利:

公开号 | 公开日

US20130185062A1|2013-07-18|

JP6151405B2|2017-06-21|

JP2016174383A|2016-09-29|

BR112014017120B1|2021-06-15|

CN104040621A|2014-09-10|

BR112014017120A8|2017-07-04|

TWI499247B|2015-09-01|

EP2803065B1|2017-01-18|

HUE037362T2|2018-08-28|

DK2803065T3|2017-03-13|

TW201338468A|2013-09-16|

KR20140119735A|2014-10-10|

DK2812895T3|2018-01-08|

EP2803065A1|2014-11-19|

US20130185084A1|2013-07-18|

EP2812895B1|2017-11-01|

EP2812895A1|2014-12-17|

IN2014CN04644A|2015-09-18|

CN104040621B|2017-06-30|

KR101585367B1|2016-01-13|

SI2803065T1|2017-03-31|

KR20140111035A|2014-09-17|

JP2015507221A|2015-03-05|

BR112014017120A2|2017-06-13|

HUE032016T2|2017-08-28|

ES2621417T3|2017-07-04|

CN104040622A|2014-09-10|

BR112014017119A2|2017-06-13|

KR101570631B1|2015-11-19|

US9053702B2|2015-06-09|

JP5996670B2|2016-09-21|

BR112014017119A8|2017-07-04|

ES2653949T3|2018-02-09|

WO2013106187A1|2013-07-18|

WO2013106181A1|2013-07-18|

US9047863B2|2015-06-02|

CN104040622B|2017-08-11|

JP2015510313A|2015-04-02|

引用文献:

公开号 | 申请日 | 公开日 | 申请人 | 专利标题

EP0754379B1|1994-04-08|2002-09-18|Echelon Corporation|Method and apparatus for robust communications based upon angular modulation|

TW271524B|1994-08-05|1996-03-01|Qualcomm Inc|

US5732389A|1995-06-07|1998-03-24|Lucent Technologies Inc.|Voiced/unvoiced classification of speech for excitation codebook selection in celp speech decoding during frame erasures|

JP3254126B2|1996-02-13|2002-02-04|株式会社日立国際電気|Variable rate coding|

FI116181B|1997-02-07|2005-09-30|Nokia Corp|Information coding method utilizing error correction and error identification and devices|

US6405338B1|1998-02-11|2002-06-11|Lucent Technologies Inc.|Unequal error protection for perceptual audio coders|

US6445686B1|1998-09-03|2002-09-03|Lucent Technologies Inc.|Method and apparatus for improving the quality of speech signals transmitted over wireless communication facilities|

US20010041981A1|2000-02-22|2001-11-15|Erik Ekudden|Partial redundancy encoding of speech|

JP2002314597A|2001-04-09|2002-10-25|Mitsubishi Electric Corp|Voice packet communication equipment|

JP4022427B2|2002-04-19|2007-12-19|独立行政法人科学技術振興機構|Error concealment method, error concealment program, transmission device, reception device, and error concealment device|

FI116016B|2002-12-20|2005-08-31|Oplayo Oy|a buffering|

CA2523343A1|2003-04-21|2004-11-04|Rgb Networks, Inc.|Time-multiplexed multi-program encryption system|

US7546508B2|2003-12-19|2009-06-09|Nokia Corporation|Codec-assisted capacity enhancement of wireless VoIP|

US7668712B2|2004-03-31|2010-02-23|Microsoft Corporation|Audio encoding and decoding with intra frames and adaptive forward error correction|

DE602004004376T2|2004-05-28|2007-05-24|Alcatel|Adaptation procedure for a multi-rate speech codec|

US7944470B2|2005-03-04|2011-05-17|Armida Technologies Corporation|Wireless integrated security controller|

WO2007045273A1|2005-10-17|2007-04-26|Telefonaktiebolaget Lm Ericsson |Method and apparatus for estimating speech quality|

US8255207B2|2005-12-28|2012-08-28|Voiceage Corporation|Method and device for efficient frame erasure concealment in speech codecs|

US20080077410A1|2006-09-26|2008-03-27|Nokia Corporation|System and method for providing redundancy management|

CN101072083A|2007-06-04|2007-11-14|深圳市融合视讯科技有限公司|Method for optimizing network data transmission redundancy error correction rate|

US8352252B2|2009-06-04|2013-01-08|Qualcomm Incorporated|Systems and methods for preventing the loss of information within a speech frame|

EP2346028A1|2009-12-17|2011-07-20|Fraunhofer-Gesellschaft zur Förderung der Angewandten Forschung e.V.|An apparatus and a method for converting a first parametric spatial audio signal into a second parametric spatial audio signal|

US9047863B2|2012-01-12|2015-06-02|Qualcomm Incorporated|Systems, methods, apparatus, and computer-readable media for criticality threshold control|JP3003599B2|1996-11-20|2000-01-31|日本電気株式会社|Ni-Zn ferrite|

US9047863B2|2012-01-12|2015-06-02|Qualcomm Incorporated|Systems, methods, apparatus, and computer-readable media for criticality threshold control|

US9178778B2|2012-03-23|2015-11-03|Avaya Inc.|System and method for end-to-end RTCP|

US9356917B2|2012-03-23|2016-05-31|Avaya Inc.|System and method for end-to-end encryption and security indication at an endpoint|

US9860296B2|2012-03-23|2018-01-02|Avaya Inc.|System and method for end-to-end call quality indication|

CN102771062B|2012-04-11|2014-12-03|华为技术有限公司|Method and device for configuring transmission mode|

WO2014094204A1|2012-12-17|2014-06-26|Intel Corporation|Leveraging encoder hardware to pre-process video content|

US9831898B2|2013-03-13|2017-11-28|Analog Devices Global|Radio frequency transmitter noise cancellation|

FR3007230B1|2013-06-17|2015-07-31|Sdmo Ind|COMMUNICATION METHOD IN A NETWORK INTERCONNECTING AT LEAST TWO GENERATING GROUPS, AND CORRESPONDING INTERFACING DEVICE.|

US10614816B2|2013-10-11|2020-04-07|Qualcomm Incorporated|Systems and methods of communicating redundant frame information|

GB2527365B|2014-06-20|2018-09-12|Starleaf Ltd|A telecommunication end-point device data transmission controller|

US9984699B2|2014-06-26|2018-05-29|Qualcomm Incorporated|High-band signal coding using mismatched frequency ranges|

US9680507B2|2014-07-22|2017-06-13|Qualcomm Incorporated|Offset selection for error correction data|

US9762355B2|2014-07-31|2017-09-12|Qualcomm Incorporated|System and method of redundancy based packet transmission error recovery|

BR112017016114A2|2015-03-12|2018-03-27|Ericsson Telefon Ab L M|Method for treating codec rate adaptation on a communication network, wireless device, network node, computer program, and, computer program product.|

US9948578B2|2015-04-14|2018-04-17|Qualcomm Incorporated|De-jitter buffer update|

US20160323425A1|2015-04-29|2016-11-03|Qualcomm Incorporated|Enhanced voice servicesin 3gpp2 network|

IL239333A|2015-06-10|2020-06-30|Elta Systems Ltd|System for generating, transmitting and receiving auxiliary signals and methods useful in conjuction therewith|

KR20170035602A|2015-09-23|2017-03-31|삼성전자주식회사|Voice Recognition Apparatus, Voice Recognition Method of User Device and Computer Readable Recording Medium|

US10504525B2|2015-10-10|2019-12-10|Dolby Laboratories Licensing Corporation|Adaptive forward error correction redundant payload generation|

US10049682B2|2015-10-29|2018-08-14|Qualcomm Incorporated|Packet bearing signaling information indicative of whether to decode a primary coding or a redundant coding of the packet|

US10049681B2|2015-10-29|2018-08-14|Qualcomm Incorporated|Packet bearing signaling information indicative of whether to decode a primary coding or a redundant coding of the packet|

RU2711108C1|2016-03-07|2020-01-15|Фраунхофер-Гезелльшафт Цур Фердерунг Дер Ангевандтен Форшунг Е.Ф.|Error concealment unit, an audio decoder and a corresponding method and a computer program subjecting the masked audio frame to attenuation according to different attenuation coefficients for different frequency bands|

ES2870959T3|2016-03-07|2021-10-28|Fraunhofer Ges Forschung|Error concealment unit, audio decoder and related method, and computer program using characteristics of a decoded representation of a properly decoded audio frame|

CN107528668B|2016-06-21|2021-09-24|中兴通讯股份有限公司|Data transmission method and equipment|

CN107845385B|2016-09-19|2021-07-13|南宁富桂精密工业有限公司|Coding and decoding method and system for information hiding|

WO2018066731A1|2016-10-07|2018-04-12|삼성전자 주식회사|Terminal device and method for performing call function|

EP3585093B1|2017-03-23|2021-09-01|Guangdong Oppo Mobile Telecommunications Corp., Ltd.|Method for transmitting data, terminal device, and network device|

US10645228B2|2017-06-26|2020-05-05|Apple Inc.|Adaptability in EVS codec to improve power efficiency|

CN109286952A|2017-07-19|2019-01-29|中兴通讯股份有限公司|Data transmission method, device and storage medium|

CN108764469A|2018-05-17|2018-11-06|普强信息技术（北京）有限公司|The method and apparatus of power consumption needed for a kind of reduction neural network|

法律状态:
2018-03-27| B15K| Others concerning applications: alteration of classification|Ipc: G10L 19/005 (2013.01), H04L 1/00 (2006.01), H04N 7 |

2018-12-04| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|

2019-09-10| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|

2020-10-06| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|

2020-12-22| B16A| Patent or certificate of addition of invention granted|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 20/12/2012, OBSERVADAS AS CONDICOES LEGAIS. |

优先权:

申请号 | 申请日 | 专利标题

US201261586007P| true| 2012-01-12|2012-01-12|

US61/586,007|2012-01-12|

US201261587507P| true| 2012-01-17|2012-01-17|

US61/587,507|2012-01-17|

US201261641093P| true| 2012-05-01|2012-05-01|

US61/641,093|2012-05-01|

US13/719,701|US9053702B2|2012-01-12|2012-12-19|Systems, methods, apparatus, and computer-readable media for bit allocation for redundant transmission|

US13/719,701|2012-12-19|

PCT/US2012/071015|WO2013106187A1|2012-01-12|2012-12-20|System, methods, apparatus, and computer-readable media for bit allocation for redundant transmission of audio data|

[返回顶部]